What percentage of Americans are, or have been, infected with the coronavirus? And, what is the probability of dying from the virus if you catch it? One of the most unsettling aspects of the COVID-19 pandemic is that these two fundamental rates – the coronavirus infection rate and the case fatality rate – are not known.
But the same approaches used for political polling can be used to answer how widespread and how deadly the coronavirus is.
The simplest way to find out how many Americans have the virus and what risk it poses would be to test every person in the U.S. But there are not infinite resources, and testing for the coronavirus has been much more selective. The Centers for Disease Control and Prevention’s top priorities for testing have been hospitalized patients and medical staff with symptoms, and overall it is generally symptomatic people who have been tested.
Because of this selective testing, epidemiologists and public health officials in the U.S. simply do not know the true extent of the coronavirus’s penetration into the country. And without knowing how many people have been infected, the case fatality rate and other coronavirus statistics are impossible to calculate. Fortunately, there is a straightforward way to learn how widespread and deadly COVID-19 really is: test randomly.
Testing symptomatic patients reflects a classic error in sampling.
So why isn’t it possible to calculate the infection and case fatality rates from the millions of COVID-19 tests that have already been performed? The problem lies in who has been tested.
Testing symptomatic patients reflects a classic error in sampling. Researchers want to know who has coronavirus, but since most of those tested have symptoms, medical professionals have been sampling from a group with higher rates of infection. It’s not a good representation of the U.S. population at large.
By testing a large enough number of people randomly, it is possible to get a sample group whose demographics are representative of the whole country.
Public health officials could start randomly picking and testing people from across the U.S., and then following up to see what fraction of those who tested positive died from COVID-19. If random testing is done right, the infection and case fatality rates in the random sample should be very close to the actual rates.
So how many people do you need to randomly test to get data that can accurately describe the whole U.S.? The mathematics behind this question have long been worked out.
Presidential approval polls often sample roughly 1,000 people. This produces a margin of error of about 3%, meaning that random chance could make the results off by up to 3%.
A margin of error of 3% may be fine for estimating presidential approval, but it is probably not accurate enough for the coronavirus pandemic. If 10,000 individuals in the U.S. were tested, the margin of error becomes 1%. In practice, these margins of error are conservative. Actual margins of error from a random sample of 10,000 individuals will probably be much smaller and likely accurate enough to give public health officials useful information about the total number of infected and fatality rates.
The key is in random selection. A sample of 10,000 Americans is most useful if those being tested are chosen by lottery.
Random testing would also mean people who are infected but not sick would be tested and the rate of asymptomatic cases also could be determined.
Random sampling could illuminate trends involving geography, ethnicity and other demographic variables before the worst damage is done, and public health officials could enact targeted and nuanced policies to help high-risk groups or regions.
Public health officials have used randomization in other settings, such as monitoring the spread of typhoid fever in parts of Egypt, and it works. The mathematics behind random sampling is foundational to many areas of polling and statistics. The only thing public health officials need to do is figure out the execution. It is possible.
Daniel N. Rockmore is associate dean for the sciences at Dartmouth College. Michael Herron is chair of the Program in Quantitative Social Science at Dartmouth College. Distributed by The Associated Press.