JoNova » Californian antibody test finds only 1.5% of self selecting group in highest risk county actually had coronavirus

“Coronavirus may be far more widespread than known”. Or not.

Yet another small non-random study shows “48,000 – 81,000” people in Santa Clara County had Coronavirus and didn’t know it, but all the study really shows may be the power of motivated reasoning.

The Santa Clara study looked at the county with the highest number of Covid cases in California, then advertised on Facebook for people to come forward for an unvalidated test, after which the results were adjusted upwards and converted into headline grabbing ratios and extrapolated to include the whole county and to calculate case fatality rates.

Advertising for participants creates an obvious selection bias straight away — people who thought they may have had coronavirus are surely more likely to want to go and get tested. But people who knew they didn’t have it (because they had self isolated) might not want to turn up and stand in a queue or even catch coronavirus while they waited.

Basically, they found 50 people out of 3,330 tested positive. About half of which were likely to be false positives. They weighted the sample by zip code, race and sex, but for some reason, didn’t adjust for age, which is a defining characteristic of infection and fatality rates, but then they estimated fatality rates anyway.

Effectively, the study found only 1.5% of a group who probably thought they had had Coronavirus had actually had it. With post code adjustments, the rate was lifted to between 2.5% – 4.2%.

The headlines took a bad study and made it worse:

Stanford University antibody testing finds California virus infections are 50 TIMES higher than reported – suggesting COVID-19 is more widespread across the US

Coronavirus spread: Number of people infected by COVID-19 may be 50-80 times higher than official count

Commenters under the preprint are unimpressed. Quite a few estimate that it is not possible to be sure there were any true positive results given the false positive rates. Though one defended the published confidence interval estimates by pointing out that people complaining about false positives had forgot to account for the false negatives. We know how good a study is when they need the false negatives to counteract the false positives so they know they got more than zero.

mendel • 3 days ago

First, he picked the county that had the earliest cases in California and had the outbreak the first, ensuring that the population would be undertested. This means that it’s likely that every other county in California has fewer unregistered infections than Santa Clara.

Second, study participants were people who responded to a facebook ad. This is a self-selected sample, and this property completely kills the usefulness of the study all by itself. This is a beginner’s error! People who think they had Covid-19 and didn’t get tested or know someone who did are much more likely to respond to such an ad than people who did not….

Third, age is the one most common predictor of mortality. He did not weigh the results by age, and old people are underrepresented in the study. Anything he says about mortality is completely useless if we don’t know how prevalent the infection was in the older population. (In Germany, cases show that the prevalence among tested older people was low initially and took a few weeks to rise.)

Fourth, instead he weighs prevalence by zip code–why? This exacerbates statistical variations, since there were only 50 positive results, and Santa Clara has ~60 zip codes. If you have a positive result fall on a populous zip code by chance where only a few participants participated, then the numbers are skewed up. They must have seen this happen because their estimated prevalence is almost twice as high as the raw prevalence.

Fifth, the specificity of the test is “99.5% (95 CI 98.3-99.9%)”. This means that theoretically, if the specificity was 98.5%, all of the 50 positive results could be false positives, and nobody in the sample would have had any Covid-19. This means the result is not statistically significant even if the sample had been well chosen (which it wasn’t). (It’s not even significant at the 90% level.)

Sixth, they used a notoriously inaccurate “lateral flow assay” instead of an ELISA test and did not validate their positive samples (only 50) with a more sensitive test — why not?

Seventh, The Covid-19-antibody test can create false positives if it cross-reacts with other human coronavirus antibodies, i.e. if you test the samples of people who had a cold, your speficity will suffer. Therefore, a manufacturer could a) test blood donor samples, they not allowed to give blood if they have been sick shortly before; b) test samples taken in the summer when people are less likely to have colds than in March.

To state the previous three points this in another way, a large number of positive results (a third if the specificy is actually 99.5%, but probably more than that) are fake, and depending on which zip codes they randomly fall in, they could considerably skew the results.

Spacecat56 asks what happened to data collected on prior symptoms?

The draft acknowledges the possibility of this bias but tosses it off as “hard to ascertain”. But the draft also says that data on prior symptoms were collected; data which are entirely omitted both from the published analysis and from the published tables.

Because the analysis ignores this factor and because of the potential for this bias to totally dominate the analysis, in my opinion after reading the study draft, we still know effectively nothing at all about the prevalence of infection in the studied population.

Were people who had been sick in the last month more likely to volunteer for testing?

Antibody tests are unreliable at this stage — there may be cross reactivity with the common cold coronavirus, and apparently researchers only used 30 “pre-covid-19” blood samples to rule that out. Could this be right?

Animesh Ray • 2 days ago

This manuscript should not have seen the light of the day in this form, let alone be published even in a pre-print format because of the sensitivity of the topic.

Here is the reason: The common cold coronaviruses that could potentially cross-react to existing pre-COVID19 IgM/IgG are quite prevalent in the population. To address this, the authors tested 30 pre-COVID19 sera.

Given an unadjusted detection rate of 2.8% seropositives in post-COVID-19 samples, if all were false positives, they needed to test, for 99% confidence, a MINIMUM of log(0.01)/log(0.972) = 162 pre-COVID19 sera of similar demographics (age/sex/location).

Instead, they tested only 30!

On this basis I cannot attach any value to this report.

Other commenters note the test packets warn that other common cold viruses may test positive.

cnrcbioinfo I.J. Frame

From the test package insert, “Positive results may be due to past or present infection with non-SARS-CoV-2 coronavirus strains, such as coronavirus HKU1, NL63, OC43, or 229E.”

The lead author was writing in The Wall St Journal before the tests were even done that “projections of the death toll could plausibly be orders of magnitude too high.” As commenter Andy asks: Were they just shooting for what they wrote in the WSJ?

8.1 out of 10 based on 39 ratings