What your doctor is reading on Medscape.com:
MAY 07, 2020 -- You're asked to measure a lamppost's height with an inaccurate ruler. If you know the ruler's error, say it always overestimates height by 5 cm, you could still measure the lamppost accurately. If you don't know how inaccurate the ruler is (or even that it is inaccurate), you have two unknowns ― the height of the lamppost and the error of the ruler. In this situation, the ruler measures the lamppost, and the lamppost measures the ruler. Mathematician and philosopher Nassim Taleb names this duality ― when the measuring instrument is measured by what it tries to measure ― Wittgenstein's Ruler.
In the COVID-19 pandemic, two lampposts need measuring: the SARS-CoV-2 community prevalence, and the infection fatality rate (IFR). The IFR isn't the same as the (symptomatic) case fatality rate (CFR), which is the fatality rate among those diagnosed with the infection ― ie, those with symptoms severe enough to seek medical attention. The denominator for CFR omits those who had COVID-19 without symptoms or mild symptoms that were dismissed.
The CFR overestimates viral lethality but underestimates its contagiousness. Its denominator is easier to calculate than that of the IFR, whose denominator, community prevalence, eludes RT-PCR testing, which detects active infection only. For IFR, a test must uncover those who had it rather than those who have it.
The latest beacon is serology, which is the diagnostic analysis of our blood. In response to coronavirus, our immune system produces antibodies, a rebellion of sorts. The rebels hang around for a while after the fight. Serum antibodies, an imprimatur of the virus, can detect undocumented infections.
Determining community prevalence of COVID-19 is straightforward, at least in theory. Say 50 people out of a random and representative sample of 1000 have antibodies to the novel coronavirus. Then the prevalence of COVID-19 is 5%. Not everyone in the community need be tested to determine prevalence. But it's not so simple.
First, samples are seldom truly random or genuinely representative of the community, which means that raw prevalence estimates must be corrected for sampling biases. The adjusted prevalence becomes an approximation of the truth.
The more intractable problem brings us back to Wittgenstein's Ruler ― our measuring instrument is imperfect. Any test has two attributes: sensitivity ― its ability to find disease in those with the disease; and specificity ― its ability to correctly identify those without the disease. Imperfect sensitivity leads to false negatives. Imperfect specificity leads to false positives.
To estimate community prevalence accurately, a test should have high sensitivity and specificity; specificity is more important when the prevalence is low. This is best explained with examples.
Distanceville: Social Distancing Is in Our Name
Consider Distanceville, a town of 10,000, where people practice social distancing and where the community prevalence is 1%, ie, 100 residents had COVID-19 and 9900 didn't. Now let's compare two antibody tests, Test A and Test B.
Test A has a sensitivity of 90% and a specificity of 95%. Of the 100 Distanceville residents with COVID-19, 90 will test positive with test A. Of their 9900 neighbors without COVID-19, 9405 will test negative (0.95 x 9900), and the remaining 495 will falsely test positive. Total positives, true and false, is 585, which is almost six times the true prevalence of 100.
Note, total positives are overwhelmed by the false positives. Test A misses 10 people with COVID-19 because of imperfect sensitivity, but this error is dwarfed by the error from imperfect specificity by a factor of 50. False positives overestimate the community prevalence.
Test B is both worse and better than Test A, with a sensitivity of 80% and a specificity of 98%. Of 100 Distancevillers with COVID-19, 80 will test positive with Test B. Of the 9900 without COVID-19, 9702 will test negative, and the remaining 198 will falsely test positive. Total positives, true and false, is 278, nearly three times the true prevalence. In this case there are 2.5 false positives for every true positive. Although Test B misses COVID-19 in 20 people, this error is less overwhelmed by the error from imperfect specificity, down from a factor of 50 to 20.
Test B is far from perfect and still overestimates prevalence significantly, but less so than Test A because of a modest improvement in specificity (3%), despite a larger reduction in sensitivity (10%).
Closeville: Social Distancing Is Not in Our Nature
Let's move to Closeville, a town of similar size whose populace doesn't practice social distancing with the same diligence the residents of Distanceville. The community prevalence is naturally higher, at 10% ― 1000 people had COVID-19, and 9000 didn't.
Of the 1000 who had COVID-19, 900 will test positive with Test A, which, recall, has a specificity of 95% and a sensitivity of 90%. Of the 9000 without COVID-19, 8550 will test negative, and 450 will falsely test positive. Total positives, true and false, are 1350, 1.35 times the true prevalence. Now, there are half as many false positives for every true positive. Test A misses 100 people with COVID-19. The error from imperfect sensitivity is overwhelmed by the error from imperfect specificity by a factor 4.5 ― a dramatic reduction compared with Distanceville.
Test A better estimates the true prevalence in Closeville than in Distanceville because the prevalence of COVID-19 is higher in Closeville. Test B better estimates the true prevalence in Distanceville than Test A because of its higher specificity.
Just as an imperfect ruler can still measure the height accurately if its error is constant and known, serology need not have perfect sensitivity and specificity to estimate COVID-19's prevalence. But often we don't know the sensitivity and specificity of a test, and sometimes these attributes, notably specificity, vary across different populations. This is because the factors that confuse the test and create false positives, such as cross-reactivity with antibodies to other coronaviruses, can vary across different towns. For example, the test's specificity could be lower if Closeville had more people who suffered the common cold than Distanceville had. Our handicap isn't the test's imperfect specificity, it's that we don't know the degree of imperfection.
How can we measure COVID-19's prevalence if we don't know the true specificity of serology? What if its specificity is so abysmally low that the false positives overwhelm the true positives by a factor of 20? Here, we have to apply the duality of Wittgenstein's Ruler. Instead of using serology to estimate the prevalence of COVID-19, let's use the prevalence to estimate the specificity of serology.
The Santa Clara Seroprevalence Study
Take the Santa Clara study, in which researchers used antibodies to estimate community prevalence. The study has been rightly criticized for its flaws. Volunteers for serology were inveigled from Facebook ― as popular as this medium is, it's neither random nor representative of the community as a whole.
Nevertheless, the study is actually informative, though not in ways many think. Of the 3330 tests conducted, 50 were positive. We don't know what proportion of these are true positives because we don't know the true specificity of the serology test (despite the study authors validation claims). Let's work backwards and apply reductio ad absurdum. Assume the prevalence of COVID-19 in Santa Clara is zero ― we realize this is a bold assumption, but stay with us. All 50 positives are now false positives. This means that the test still correctly classifies 3280 of 3330 people as being negative, yielding a specificity of 98.5%.
This doesn't mean that the specificity is actually 98.5%. It means that it's at least 98.5% in this cohort. Specificity may be different in other cohorts. If it were lower than 98.5%, there'd be more false positives and, therefore, more total positives. The total number of positives fences off the lower bound of specificity.
We want better specificity because improving the specificity even by a whisker substantially reduces prevalence overestimations when the real prevalence is low. Let's assume the true prevalence of COVID-19 in Santa Clara is only 1%. From a representative sample of 3330 people, 33 would have COVID-19, and 3297 wouldn't. The false positives would drop from 49 to 17 if specificity improved from a commendable 98.5% to a near-perfect 99.5%.
High specificity is critical when estimating community prevalence. "Grade A" is insufficient. It must get an A plus. With our reductio ad absurdum exercise, we can be fairly certain that the specificity of the Stanford test is in the A-plus range. If the specificity were as abysmally low as 90% in the Santa Clara cohort, serology would be dead on arrival, which would be a major victory for SARS-CoV-2.
Saurabh Jha, MD, MS, is a radiologist who spends his professional life dealing with abysmal specificities. He can be reached @RogueRad. Venkatesh Murthy, MD, PhD, is a cardiologist who advocates for higher mathematical literacy among physicians. He can be reached @venkmurthy .