Why Meta’s latest large language model survived only three days online
On November 15 Meta unveiled a new large language model called Galactica, designed to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday the company took down the public demo that it had encouraged everyone to try out.
Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, including its tendencies to reproduce prejudice and assert falsehoods as facts.
However, Meta and other companies working on large language models, including Google, have failed to take it seriously.
Galactica is a large language model for science, trained on 48 million examples of scientific articles, websites, textbooks, lecture notes, and encyclopedias. Meta promoted its model as a shortcut for researchers and students. In the company’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”
But the shiny veneer wore through fast. Like all language models, Galactica is a mindless bot that cannot tell fact from fiction. Within hours, scientists were sharing its biased and incorrect results on social media.
“I am both astounded and unsurprised by this new effort,” says Chirag Shah at the University of Washington, who studies search technologies. “When it comes to demoing these things, they look so fantastic, magical, and intelligent. But people still don’t seem to grasp that in principle such things can’t work the way we hype them up to.”
Asked for a statement on why it had removed the demo, Meta pointed MIT Technology Review to a tweet that says: “Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”
A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. People found that it made up fake papers (sometimes attributing them to real authors), and generated wiki articles about the history of bears in space as readily as ones about protein complexes and the speed of light. It’s easy to spot fiction when it involves space bears, but harder with a subject users may not know much about.
Many scientists pushed back hard. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, who works on deep learning, tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.”
Even more positive opinions came with clear caveats: “Excited to see where this is headed!” tweeted Miles Cranmer, an astrophysicist at Princeton. “You should never keep the output verbatim or trust it. Basically, treat it like an advanced Google search of (sketchy) secondary sources!”
Galactica also has problematic gaps in what it can handle. When asked to generate text on certain topics, such as “racism” and “AIDS,” the model responded with: “Sorry, your query didn’t pass our content filters. Try again and keep in mind this is a scientific language model.”
The Meta team behind Galactica argues that language models are better than search engines. “We believe this will be the next interface for how humans access scientific knowledge,” the researchers write.
This is because language models can “potentially store, combine, and reason about” information. But that “potentially” is crucial. It’s a coded admission that language models cannot yet do all these things. And they may never be able to.
“Language models are not really knowledgeable beyond their ability to capture patterns of strings of words and spit them out in a probabilistic manner,” says Shah. “It gives a false sense of intelligence.”
Gary Marcus, a cognitive scientist at New York University and a vocal critic of deep learning, gave his view in a Substack post titled “A Few Words About Bullshit,” saying that the ability of large language models to mimic human-written text is nothing more than “a superlative feat of statistics.”
And yet Meta is not the only company championing the idea that language models could replace search engines. For the last couple of years, Google has been promoting its language model PaLM as a way to look up information.
It’s a tantalizing idea. But suggesting that the human-like text such models generate will always contain trustworthy information, as Meta appeared to do in its promotion of Galactica, is reckless and irresponsible. It was an unforced error.
And it wasn’t just the fault of Meta’s marketing team. Yann LeCun, a Turing Award winner and Meta’s chief scientist, defended Galactica to the end. On the day the model was released, LeCun tweeted: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?”
It’s not quite Meta’s Tay moment. Recall that in 2016, Microsoft launched a chatbot called Tay on Twitter—then shut it down 16 hours later when Twitter users turned it into a racist, homophobic sexbot. But Meta’s handling of Galactica smacks of the same naivete.
“Big tech companies keep doing this—and mark my words, they will not stop—because they can,” says Shah. “And they feel like they must—otherwise someone else might. They think that this is the future of information access, even if nobody asked for that future.”