Google DeepMind says it’s trained an artificial intelligence that can predict which DNA variations in our genomes are likely to cause disease—predictions that could speed diagnosis of rare disorders and possibly yield clues for drug development.
DeepMind, founded in London and acquired by Google 10 years ago, is known for artificial-intelligence programs that play video games and have conquered complex board games like Go. It jumped into medicine when it announced that its program AlphaFold was able to accurately predict the shape of proteins, a problem considered a “grand challenge” in biology.
Now the company says it has fine-tuned that protein model to predict which misspellings found in human DNA are safe to ignore and which are likely to cause disease. The new software, called AlphaMissense, was described today in a report published by the journal Science.
As part of its project, DeepMind says, it is publicly releasing tens of millions of these predictions, but the company isn’t letting others directly download the model because of what it characterizes as potential biosecurity risks should the technique be applied to other species.
Although not intended to directly make diagnoses, computer predictions are already used by doctors to help locate the genetic causes of mysterious syndromes. In a blog post, DeepMind said its results are part of an effort to uncover “the root cause of disease” and could lead to “faster diagnosis and developing life-saving treatments.”
The three-year project was led by DeepMind engineers Jun Cheng and Žiga Avsec, and the company said it is publicly releasing predictions for 71 million possible variants. Each is what’s known as a missense mutation—a single DNA letter that, if altered, changes the protein a gene makes.
“The goal here is, you give me a change to a protein, and instead of predicting the protein shape, I tell you: Is this bad for the human that has it?” says Stephen Hsu, a physicist at Michigan State University who works on genetic problems with AI techniques. “Most of these flips, we just have no idea whether they cause sickness.”
Outside experts said DeepMind’s announcement was the latest in a string of flashy demonstrations whose commercial value remains unclear. “DeepMind is being DeepMind,” says Alex Zhavoronkov, founder of Insilico Medicine, an AI company developing drugs. “Amazing on PR and good work on AI.”
Zhavoronkov says the real test of modern artificial intelligence is whether it can lead to new cures, something that still hasn’t happened. But some AI-designed drugs are in testing, and efforts to create useful new proteins are a particularly hot sector, investors say. One company, Generate Biomedicines, just raised $273 million to create antibodies, and a team of former Meta engineers started EvolutionaryScale, which thinks AI can come up with “programmable cells that seek out and destroy cancer,” according to Forbes.
DeepMind’s new effort has less to do with drugs, however, and more to do with how doctors diagnose rare disease, especially in patients with mystery symptoms, like a newborn with a rash that won’t go away, or an adult suddenly feeling weaker.
With the rise of gene sequencing, doctors can now decode people’s genomes and then scour the DNA data for possible culprits. Sometimes, the cause is clear, like the mutation that leads to cystic fibrosis. But in about 25% of cases where extensive gene sequencing is done, scientists will find a suspicious DNA change whose effects aren’t fully understood, says Heidi Rehm, director of the clinical laboratory at the Broad Institute, in Cambridge, Massachusetts.
Scientists call these mystery mutations “variants of uncertain significance,” and they can appear even in exhaustively studied genes like BRCA1, a notorious hot spot of inherited cancer risk. “There is not a single gene out there that does not have them,” says Rehm.
DeepMind says AlphaMissense can help in the search for answers by using AI to predict which DNA changes are benign and which are “likely pathogenic.” The model joins previously released programs, such as one called PrimateAI, that make similar predictions.
“There has been a lot of work in this space already, and overall, the quality of these in silico predictors has gotten much better,” says Rehm. However, Rehm says computer predictions are only “one piece of evidence,” which on their own can’t convince her a DNA change is really making someone sick.
Typically, experts don’t declare a mutation pathogenic until they have real-world data from patients, evidence of inheritance patterns in families, and lab tests—information that’s shared through public websites of variants such as ClinVar.
“The models are improving, but none are perfect, and they still don’t get you to pathogenic or not,” says Rehm, who says she was “disappointed” that DeepMind seemed to exaggerate the medical certainty of its predictions by describing variants as benign or pathogenic.
DeepMind says the new model is based on AlphaFold, the earlier model for predicting protein shapes. Even though AlphaMissense does something very different, says Pushmeet Kohli, a vice president of research at DeepMind, the software is somehow “leveraging the intuitions it gained” about biology from its previous task. Because it was based on AlphaFold, the new model requires relatively less computer time to run—and therefore less energy than if it had been built from scratch.
In technical terms, the model is pre-trained, but then adapted to a new task in an additional step called fine-tuning. For this reason, Patrick Malone, a doctor and biologist at KdT Ventures, believes that AlphaMissense is “an example of one of the most important recent methodological developments in AI.”
“The concept is that the fine-tuned AI is able to leverage prior learning,” says Malone. “The pre-training framework is especially useful in computational biology, where we are often limited by access to data at sufficient scale.”
DeepMind says it’s provided free access to all its predictions for human genes, as well as all the details needed to fully replicate the work, including computer code. However, it isn’t releasing the whole model for immediate download and use by others because of what it calls a biosecurity risk if it were applied to analyze the genes of species other than humans.
“As part of our commitment to releasing our research breakthroughs safely and responsibly, we will not be sharing model weights, to prevent use in potentially unsafe applications,” the authors wrote in the fine print of their paper.
It’s not obvious what those unsafe applications are, or what non-human species the researchers had in mind. DeepMind didn’t spell them out, but risks could include using an AI to design more dangerous bacteria or a bioweapon.
However, at least one outside expert we spoke to, who asked for anonymity because Google invests in companies he’s started, characterized the restrictions as a transparent effort to stop others from quickly deploying the model for their own uses.
DeepMind denied it was throttling the model for reasons other than safety. The work was assessed both by the Google DeepMind Institute, which studies responsible AI, and by an “outside biosafety expert,” a spokesperson for DeepMind said.
The restriction on the model “primarily limits making predictions on non-human protein sequences,” DeepMind said in a statement. “Not releasing weights prevents others from simply downloading the model and using it in non-human species … hence reducing the likelihood of misuse by bad actors.“