The CDC’s $1.75 billion sequencing boom may be throwing money at the wrong problem
Shortly after President Biden was inaugurated, the man who was being given command of his coronavirus response had a message about what America needed to do. “We’re 43rd in the world in genomic sequencing,” said Jeff Zients at a press conference in January. “Totally unacceptable.”
The answer, he suggested, was to “do the appropriate amount of genomic sequencing, which will allow us to spot variants early, which is the best way to deal with any potential variants.”
Scientists have been sequencing the genomes of covid samples since the first identified case in Wuhan; the first mRNA vaccines were built using genetic code released publicly by Chinese scientists in January. And it’s been done at an unprecedented scale. In mid-December, 51,000 covid genomes from the US had already been decoded and posted in public repositories. That’s seven times the number of flu samples sequenced annually by the Centers for Disease Control and Prevention.
The vast majority of covid sequencing in America has been conducted at academic centers. That’s mostly because until recently it was considered an academic pursuit, tracking changes in a virus widely believed to evolve slowly and steadily.
Even in November and December, as both the UK and South Africa announced more transmissible strains and Denmark said it would kill 15 million mink to contain a mutation, many scientists and public health organizations argued that the virus was unlikely to escape vaccine-induced immunity.
Still, as headlines warned about “mutant coronavirus” spreading “out of control,” politicians and the public demanded to know whether there were “variants of concern” in their own backyards.
“Given the small fraction of US infections that have been sequenced, the variant could already be in the United States without having been detected,” the CDC responded, in a statement published online.
“America is flying blind” quickly became a refrain, not only for scientists seeking support for their work, but for critics of the US response looking for a solvable problem. Some frustration was certainly driven by inaccurate messaging: on December 22, for instance, the New York Times reported that fewer than 40 covid genomes had been sequenced in the US since December 1. In reality, US labs submitted nearly 10,000 new sequences to public repositories in that period.
Financial and political support came quickly under the new administration, with the CDC’s $200 million “down payment” for sequencing work. Then the relief bill passed in March dedicated an eye-popping $1.75 billion to support nationwide public health programs sequencing “diseases or infections, including covid–19.”
The CDC and the WHO set a goal of sequencing 5% of positive cases to track variant spread—a number based on a pre-print study from the dominant manufacturer of covid sequencers, Illumina.
The US quickly met that goal, mostly by paying private testing labs to sequence a small number of positive samples. In the last week of March, when there were 450,000 reported new cases, US labs—including academic labs funded through other programs—submitted 16,143 anonymized sequences to GISAID, a global repository of biological data, and 6,811 to the National Center for Biotechnology Information, or NCBI.
(That period was one of the lowest case rates in six months, however; to sequence 5% during the January peak, US labs would have needed to spend well over a million dollars a day sequencing five times the number of samples.)
America should be an excellent place to study the genetic evolution of covid. It has widespread infections, a genetically diverse population, and the largest number of vaccinated individuals in the world. But despite the increase in genomic sequencing, some public health experts and scientists are now wondering what’s being done with all this information—and how achievable the field’s goals are.
On its site describing genomic surveillance, the CDC says that sequencing can track whether variants have learned to evade vaccines or treatments. But the agency’s surveillance sequencing program doesn’t connect any of its sequences back to the people they came from, whether they were vaccinated, or how sick they got.
The biggest argument for this kind of anonymous “surveillance” sequencing, meanwhile, is that it gives officials early warning about potential changes in case rates. But in response to news that more transmissible variants are well established in America, states have been relaxing mask mandates and reopening indoor dining.
We spoke to a number of sequencing experts with firsthand experience during the pandemic and heard the same from many of them: turning surveillance data into useful knowledge faces enormous legal, political, and infrastructural barriers in the US, some of them insurmountable.
Unless scientists and policymakers ask why they want covid sequences, and how best to put that data to use, genomic surveillance will yield diminishing returns—and much of its potential will likely be wasted.
“It’s insanely difficult to do this well in the United States,” says Lane Warmbrod, senior analyst at the Johns Hopkins Center for Health Security. “I would be very disappointed if all this money just went to getting a whole bunch of covid sequences, and no thought went toward building something that lasts.”
What surveillance sequencing can’t do
There’s no question sequencing has been revolutionary for public health, not least because the mRNA vaccines were developed using sequences made public just a month after a man turned up at a Wuhan hospital with a strange illness.
Surveillance sequencing, which identifies the genetic code of a portion of positive tests and looks for changes over time, can help researchers track the virus’s evolution. If one strain increases faster than others, researchers can hone in on it for further investigation.
“Our surveillance is imperfect, but we are able to see when and where we’re getting transmission across a region, and identify broad-scale patterns of change,” says Duncan MacCannell, chief science officer of the Office of Advanced Molecular Detection, or OAMD, the CDC office responsible for expanding national sequencing efforts.
When asked why surveillance sequencing is so important, it’s common for authorities to respond that it can help track how a strain behaves in the real world. Often, though, such arguments conflate two things: sampling positive tests that have been anonymized, and using targeted analysis to understand specific, identifiable cases.
The CDC’s page on surveillance sequencing of covid variants, for example, claims that “routine analysis of genetic sequence data” can help detect variants with the “ability to evade natural or vaccine-induced immunity” and “cause either milder or more severe disease in people.”
Knowing when variants learn to evade immune systems can tell scientists whether they need to change vaccine formulas. But sequences can’t tell you those things unless connected with information about the people they came from. That’s often impossible under US regulations.
“Just because you’re seeing a variant more often, does that mean it’s actually more transmissible? Maybe,” says Brian Krueger, technical director of research and development at LabCorp, which has a covid sequencing contract with the CDC. “We need to do more science to understand if it’s doing something we’re actually worried about.”
That LabCorp contract is part of OAMD’s primary sequencing program, which pays large testing labs across the country to sequence thousands of positive covid tests. The project is primarily looking for “variants of concern,” strains already suspected to cause worse outcomes or spread faster. It’s also tracking when and where different branches of covid-19’s family tree are spreading, and which genetic changes crop up repeatedly. If one branch of the virus grows quicker than others, or one mutation keeps showing up in different families, it can be flagged for attention.
At the same time, OAMD is collecting raw samples from public health labs around the country to study in the lab, growing the viral samples in dishes and pitting them against therapeutics and patients’ antibodies. Those test tube studies are the source of most recent headlines about variants getting around protection conferred by vaccines. They also tend to dramatically undersell immune protection against illness, which has many overlapping mechanisms.
But because of patient privacy and other requirements put in place for regulatory oversight, all of these samples, as well as all the sequences they collect from labs, are deliberately de-identified: they have no connection to the patient in question.
Taking the 10,000 foot view
According to MacCannell, the OAMD has no intention of contextualizing its de-identified data with clinical information.
“Those contracts are set up to give us the 10,000 foot view,” says MacCannell.
Even if it wanted to combine surveillance sequences with patient information in its analyses, the agency would be fighting a massive uphill battle. In the US, most patient records—test results, immunization information, hospital records—are scattered across many unconnected databases. Whether or not the owners of that data are interested in turning the information over to the government, it would typically require each individual to give consent, a very laborious undertaking.
Instead of trying to work through these issues at the national level, the sequencing contracts allow individual public health agencies to request the names and contact information of people who have tested positive for variants of concern. But that just pushes the same problems of data ownership down the chain.
“Some states are very good and want to know a lot about variants that are circulating in their state,” says Labcorp’s Brian Krueger. “The other states are not.”
Public health epidemiologists often have little experience with bioinformatics, using software to analyze large datasets like genomic sequences. Only a few agencies have pre-existing sequencing programs; even if they did, having each jurisdiction analyze just a small slice of the dataset undercuts how much knowledge can be gleaned about real-world behavior.
Getting around those issues—making it easier to connect sequences and clinical metadata on a large scale—would require more than just root and branch reform of privacy regulations, however. It would need a reorganization of the entire healthcare and public health systems in the US, where each of the 64 public health agencies operate as fiefdoms, and there is no centralization of information or power.
“Metadata is the single biggest uncracked nut,” says Jonathan Quick, managing director of pandemic response, preparedness, and prevention at the Rockefeller Foundation. (The Rockefeller Foundation helps fund coverage at MIT Technology Review, although it has no editorial oversight.) Because it’s so hard for public health to put together big enough datasets to really understand real-world variant behavior, our understanding has to come from vaccine manufacturers and hospitals adding sequencing to their own clinical trials, he says.
It’s frustrating to him that so many huge datasets of useful information already exist in electronic medical records, immunization registries, and other sources, but can’t easily be used.
“There’s a whole lot more that could be learned, and learned faster, without the shackles we put on the use of that data,” says Quick. “We can’t just rely on the vaccine companies to do surveillance.”
Boosting state-level bioinformatics
If public health labs are expected to focus more on tracking and understanding variants on their own, they’ll need all the help they can get. Doing something about variants case-by-case, after all, is a public health job, while doing something about variants on a policy level is a political one.
Public health labs generally use genomics to expose otherwise-hidden information about outbreaks, or as part of track and trace efforts. In the past, sequencing has been used to connect E. coli outbreaks to specific farms, identify and interrupt chains of HIV transmission, isolate US Ebola cases, and follow annual flu patterns.
Even those with well-established programs tend to use genomics sparingly. The cost of sequencing has dropped precipitously over the last decade, but the process is still not cheap, particularly for cash-strapped state and local health departments. The machines themselves cost hundreds of thousands of dollars to buy, and more to run: Illumina, one of the biggest makers of sequencing equipment, says labs spend an average of $1.2 million annually on supplies for each of its machines.
Health agencies don’t just need money; they also need expertise. Surveillance requires highly trained bioinformaticians to turn a sequence’s long strings of letters into useful information, as well as people to explain the results to officials, and convince them to turn any lessons learned into policy.
Fortunately, the OAMD has been working to support state and local health departments as they try to understand their sequencing data, employing regional bioinformaticians to consult with public health officers and facilitating agencies’ efforts to share their experiences.
It is also pouring hundreds of millions into building and supporting those agencies’ own sequencing programs—not just for covid, but for all pathogens.
But many of those agencies are facing pressure to sequence as many covid genomes as possible. Without a cohesive strategy for collecting and analyzing data, it’s unclear how much utility those programs will have.
“We’ll miss a ton of opportunities if we just give health departments money to set up programs without having a federal strategy so that everyone knows what they’re doing,” says Warmbrod.
Initial visions, usurped
Mark Pandori is director of the Nevada state public health laboratory, one of the programs OAMD supports. He has been a strong proponent of genomic surveillance for years. Before moving to Reno, he ran the public health lab in Alameda County, California, where he helped pioneer a program using sequencing to track how infections were being passed around hospitals.
Turning sequences into usable data is the biggest challenge for public health genomics programs, he says.
“The CDC can say, ‘go buy a bunch of sequencing equipment, do a whole bunch of sequencing.’ But it doesn’t do anything unless the consumers of that data know how to use it, and know how to apply it,” he says. “I’m talking to you about the robotics we need to get things sequenced every day, but health departments just need a simple way to know if cases are related.”
When it comes to variants, public health labs are under many of the same pressures the CDC faces: everyone wants to know what variants are circulating, whether or not they can do anything with the information.
Pandori launched his covid sequencing program hoping to cut down on the labor needed to investigate potential covid outbreaks, quickly identifying whether cases caught near each other were related or coincidental.
His lab was the first in North America to identify a patient reinfected with covid-19, and later found the B.1.351 variant in a hospitalized man who had just come back from South Africa. With rapid contact tracing, the health department was able to prevent it from spreading.
But county health departments have shifted their priorities away from those boots-on-the-ground investigations in response to public focus on watching for known variants of concern, he says. It’s a move he’s quite skeptical of.
“My initial vision of using it as an epidemiological and disease investigation tool has been usurped by using this as a variant scan,” says Pandori. “It’s kind of the new phase in lab testing. We’ve gone from not having enough testing, period, to not having enough genetic sequencing, I guess. That’s what people are saying now.”
(Pandori is not the only one whose research interests have been waylaid by a focus on surveillance. Kruegar, of LabCorp, built the company’s covid sequencing program hoping to study how variants evolve within individual patients. “The currency these days seems to be, how many full genomes can you submit to the different databases?” he says.)
Each month, Pandori’s lab sends 40 samples to the CDC, as requested. The team also sequences 64 of their own samples a day. When they don’t have enough recent samples, they dip into the archives; so far they’ve gotten all the way back to samples from November.
As for sequencing 5% of Nevada’s cases, the majority of tests in the state are conducted by private labs, which generally discard the samples before they can be sequenced. “Specimens that get tested by private labs, or antigen testing, those are lost to surveillance,” he says.
Pandori says he hasn’t heard from the CDC or the public health department about variant data from the CDC’s labs program.
Do it because you have a question
The US may face unique difficulties in connecting variant sequences to their real-world behavior, but every system faces its own challenges. Even countries with well-developed national healthcare systems are struggling to wrangle the enormous amounts of data it will take to really understand what these genetic changes are doing.
In fact, there are few governments doing the work, and perhaps only one doing it successfully at scale.
COG-UK, a consortium of academic and government labs in Britain, organized the first major covid sequencing effort in the world, and is widely considered the shining star of the field. Its scientists have not only sequenced almost twice the number of samples as the US, but were also the first to identify and characterize a variant with increased transmission.
They’ve done it all for under £50 million ($69 million), according to Leigh Jackson, the consortium’s scientific project manager. “It’s quite eye-watering to compare our costs with what the private sector is charging for these types of services,” he says, noting that most of the labor has come from academic labs, which are primarily charging them for materials.
“Overwhelmingly, objective number one is going to be awareness of vaccine escape mutations in the real world. It’s going to happen. Because we have such widespread coverage and capacity now, we should be able to see them pretty quickly,” says Jackson.
That work is possible because public health and medicine in the UK are both nationalized, so tests and vaccine records are all tagged with patients’ unique NHS number. COG-UK only needs a few data-sharing agreements to link all 400,000 samples they’ve sequenced back to vaccine lists and top-level hospital data. That’s not to say combining those datasets is easy; the group is currently building out a streamlined system to connect all of the other disconnected systems together, automating the upload of new data and making it easier for partners to access.
Jackson is happy to hear about the expansion of well-designed sequencing programs, but he takes issue with mass sequencing done without clearly-defined goals.
“Don’t do it because it’s a vote winner, or it looks good, or it makes people happy. Do it because you have a question,” he says. “If you don’t, then please stop using up all of our Illumina reagents. Our supply chains have gone down the drain since the US announced they were going to up sequencing capacity.”
From sample to the “so what?”
In public health—unlike in basic research—knowledge is only power if it comes with action. It’s what Quick from the Rockefeller Foundation calls “going from the sample to the ‘so what?’”
Sequences need to be connected to immunization data on a massive scale to say anything about vaccine efficacy. Decision-makers need to respond to variants if sequencing them is going to matter. (Right now, many US states are reopening movie theaters and indoor dining, despite clear evidence more transmissible strains are driving cases up around the country.)
Warmbrod from Johns Hopkins hopes this money will be used proactively, with an eye toward the future, instead of reactively.
“When I go back and look at papers that are six, seven years old, it’s like, ‘Oh god, we’ve known about this exact problem for years, and we did nothing,’” she says. “Whatever tools and infrastructure we build now, they can be used for a lot more than just covid.”
MacCannell feels the same way. “Our role is really to figure out how to expand genomic surveillance across the US public health system in ways that aren’t just covid-specific. We want to take lessons learned, and apply them broadly.”
It’s to everyone’s benefit if this vast injection of money is used not just in response to one crisis, but in preparation for the next one. It offers a real opportunity to fix cracks in our public health system, and build stable institutions that bridge health disparities, respond strongly to potential threats, and keep functioning in a crisis.
At the same time, if the CDC is to make the most of its role as a national public health agency, it should be using all the resources at its disposal—including this massive repository of real-world variant sequences—in tracking real-world behavior, like evading vaccine-induced immunity.
Failing to do so could have deadly consequences.
“We’re setting out to immunize the planet, and I’m quite concerned,” says Quick. “We need to share this data so we don’t invest a huge amount of time and effort in immunizing as many people as possible, only to find it was much less effective than we thought. That’s more lives lost, and more credibility lost for vaccines.”
This story is part of the Pandemic Technology Project, supported by The Rockefeller Foundation.