What the history of AI tells us about its future

In May 11, 1997, Garry Kasparov fidgeted in his plush leather chair in the Equitable Center in Manhattan, anxiously running his hands through his hair. It was the final game of his match against IBM’s Deep Blue supercomputer—a crucial tiebreaker in the showdown between human and silicon—and things were not going well. Aquiver with self-recrimination after making a deep blunder early in the game, Kasparov was boxed into a corner.

A high-level chess game usually takes at least four hours, but Kasparov realized he was doomed before an hour was up. He announced he was resigning—and leaned over the chessboard to stiffly shake the hand of Joseph Hoane, an IBM engineer who helped develop Deep Blue and had been moving the computer’s pieces around the board.

Then Kasparov lurched out of his chair to walk toward the audience. He shrugged haplessly. At its finest moment, he later said, the machine “played like a god.”

For anyone interested in artificial intelligence, the grand master’s defeat rang like a bell. Newsweek called the match “The Brain’s Last Stand”; another headline dubbed Kasparov “the defender of humanity.” If AI could beat the world’s sharpest chess mind, it seemed that computers would soon trounce humans at everything—with IBM leading the way.

That isn’t what happened, of course. Indeed, when we look back now, 25 years later, we can see that Deep Blue’s victory wasn’t so much a triumph of AI but a kind of death knell. It was a high-water mark for old-school computer intelligence, the laborious handcrafting of endless lines of code, which would soon be eclipsed by a rival form of AI: the neural net—in particular, the technique known as “deep learning.” For all the weight it threw around, Deep Blue was the lumbering dinosaur about to be killed by an asteroid; neural nets were the little mammals that would survive and transform the planet. Yet even today, deep into a world chock-full of everyday AI, computer scientists are still arguing whether machines will ever truly “think.” And when it comes to answering that question, Deep Blue may get the last laugh.

When IBM began work to create Deep Blue in 1989, AI was in a funk. The field had been through multiple roller-coaster cycles of giddy hype and humiliating collapse. The pioneers of the ’50s had claimed that AI would soon see huge advances; mathematician Claude Shannon predicted that “within a matter of ten or fifteen years, something will emerge from the laboratories which is not too far from the robot of science fiction.” This didn’t happen. And each time inventors failed to deliver, investors felt burned and stopped funding new projects, creating an “AI winter” in the ’70s and again in the ’80s.

The reason they failed—we now know—is that AI creators were trying to handle the messiness of everyday life using pure logic. That’s how they imagined humans did it. And so engineers would patiently write out a rule for every decision their AI needed to make.

The problem is, the real world is far too fuzzy and nuanced to be managed this way. Engineers carefully crafted their clockwork masterpieces—or “expert systems,” as they were called—and they’d work reasonably well until reality threw them a curveball. A credit card company, say, might make a system to automatically approve credit applications, only to discover they’d issued cards to dogs or 13-year-olds. The programmers never imagined that minors or pets would apply for a card, so they’d never written rules to accommodate those edge cases. Such systems couldn’t learn a new rule on their own.

To support MIT Technology Review’s journalism, please consider becoming a subscriber.

AI built via handcrafted rules was “brittle”: when it encountered a weird situation, it broke. By the early ’90s, troubles with expert systems had brought on another AI winter.

“A lot of the conversation around AI was like, ‘Come on. This is just hype,’” says Oren Etzioni, CEO of the Allen Institute for AI in Seattle, who back then was a young professor of computer science beginning a career in AI.

In that landscape of cynicism, Deep Blue arrived like a weirdly ambitious moonshot.

The project grew out of work on Deep Thought, a chess-playing computer built at Carnegie Mellon by Murray Campbell, Feng-hsiung Hsu, and others. Deep Thought was awfully good; in 1988, it became the first chess AI to beat a grand master, Bent Larsen. The Carnegie Mellon team had figured out better algorithms for assessing chess moves, and they’d also created custom hardware that speedily crunched through them. (The name “Deep Thought” came from the laughably delphic AI in The Hitchhiker’s Guide to the Galaxy—which, when asked the meaning of life, arrived at the answer “42.”)

IBM got wind of Deep Thought and decided it would mount a “grand challenge,” building a computer so good it could beat any human. In 1989 it hired Hsu and Campbell, and tasked them with besting the world’s top grand master. Chess had long been, in AI circles, symbolically potent—two opponents facing each other on the astral plane of pure thought. It’d certainly generate headlines if they could trounce Kasparov.

To build Deep Blue, Campbell and his team had to craft new chips for calculating chess positions even more rapidly, and hire grand masters to help improve algorithms for assessing the next moves. Efficiency mattered: there are more possible chess games than atoms in the universe, and even a supercomputer couldn’t ponder all of them in a reasonable amount of time. To play chess, Deep Blue would peer a move ahead, calculate possible moves from there, “prune” ones that seemed unpromising, go deeper along the promising paths, and repeat the process several times.

“We thought it would take five years—it actually took a little more than six,” Campbell says. By 1996, IBM decided it was finally ready to face Kasparov, and it set a match for February. Campbell and his team were still frantically rushing to finish Deep Blue: “The system had only been working for a few weeks before we actually got on the stage,” he says.

It showed. Although Deep Blue won one game, Kasparov won three and took the match. IBM asked for a rematch, and Campbell’s team spent the next year building even faster hardware. By the time they’d completed their improvements, Deep Blue was made of 30 PowerPC processors and 480 custom chess chips; they’d also hired more grand masters—four or five at any given point in time—to help craft better algorithms for parsing chess positions. When Kasparov and Deep Blue met again, in May 1997, the computer was twice as speedy, assessing 200 million chess moves per second.

Even so, IBM still wasn’t confident of victory, Campbell remembers: “We expected a draw.”

The reality was considerably more dramatic. Kasparov dominated in the first game. But in its 36th move in the second game, Deep Blue did something Kasparov did not expect.

He was accustomed to the way computers traditionally played chess, a style born from machines’ sheer brute-force abilities. They were better than humans at short-term tactics; Deep Blue could easily deduce the best choice a few moves out.

But what computers were bad at, traditionally, was strategy—the ability to ponder the shape of a game many, many moves in the future. That’s where humans still had the edge.

Or so Kasparov thought, until Deep Blue’s move in game 2 rattled him. It seemed so sophisticated that Kasparov began worrying: maybe the machine was far better than he’d thought! Convinced he had no way to win, he resigned the second game.

But he shouldn’t have. Deep Blue, it turns out, wasn’t actually that good. Kasparov had failed to spot a move that would have let the game end in a draw. He was psyching himself out: worried that the machine might be far more powerful than it really was, he had begun to see human-like reasoning where none existed.

Knocked off his rhythm, Kasparov kept playing worse and worse. He psyched himself out over and over again. Early in the sixth, winner-takes-all game, he made a move so lousy that chess observers cried out in shock. “I was not in the mood of playing at all,” he later said at a press conference.

IBM benefited from its moonshot. In the press frenzy that followed Deep Blue’s success, the company’s market cap rose $11.4 billion in a single week. Even more significant, though, was that IBM’s triumph felt like a thaw in the long AI winter. If chess could be conquered, what was next? The public’s mind reeled.

“That,” Campbell tells me, “is what got people paying attention.”

The truth is, it wasn’t surprising that a computer beat Kasparov. Most people who’d been paying attention to AI—and to chess—expected it to happen eventually.

Chess may seem like the acme of human thought, but it’s not. Indeed, it’s a mental task that’s quite amenable to brute-force computation: the rules are clear, there’s no hidden information, and a computer doesn’t even need to keep track of what happened in previous moves. It just assesses the position of the pieces right now.

“There are very few problems out there where, as with chess, you have all the information you could possibly need to make the right decision.”

Everyone knew that once computers got fast enough, they’d overwhelm a human. It was just a question of when. By the mid-’90s, “the writing was already on the wall, in a sense,” says Demis Hassabis, head of the AI company DeepMind, part of Alphabet.

Deep Blue’s victory was the moment that showed just how limited hand-coded systems could be. IBM had spent years and millions of dollars developing a computer to play chess. But it couldn’t do anything else.

“It didn’t lead to the breakthroughs that allowed the [Deep Blue] AI to have a huge impact on the world,” Campbell says. They didn’t really discover any principles of intelligence, because the real world doesn’t resemble chess. “There are very few problems out there where, as with chess, you have all the information you could possibly need to make the right decision,” Campbell adds. “Most of the time there are unknowns. There’s randomness.”

But even as Deep Blue was mopping the floor with Kasparov, a handful of scrappy upstarts were tinkering with a radically more promising form of AI: the neural net.

With neural nets, the idea was not, as with expert systems, to patiently write rules for each decision an AI will make. Instead, training and reinforcement strengthen internal connections in rough emulation (as the theory goes) of how the human brain learns.

1997: After Garry Kasparov beat Deep Blue in 1996, IBM asked the world chess champion for a rematch, which was held in New York City with an upgraded machine.

The idea had existed since the ’50s. But training a usefully large neural net required lightning-fast computers, tons of memory, and lots of data. None of that was readily available then. Even into the ’90s, neural nets were considered a waste of time.

“Back then, most people in AI thought neural nets were just rubbish,” says Geoff Hinton, an emeritus computer science professor at the University of Toronto, and a pioneer in the field. “I was called a ‘true believer’”—not a compliment.

But by the 2000s, the computer industry was evolving to make neural nets viable. Video-game players’ lust for ever-better graphics created a huge industry in ultrafast graphic-processing units, which turned out to be perfectly suited for neural-net math. Meanwhile, the internet was exploding, producing a torrent of pictures and text that could be used to train the systems.

By the early 2010s, these technical leaps were allowing Hinton and his crew of true believers to take neural nets to new heights. They could now create networks with many layers of neurons (which is what the “deep” in “deep learning” means). In 2012 his team handily won the annual Imagenet competition, where AIs compete to recognize elements in pictures. It stunned the world of computer science: self-learning machines were finally viable.

Ten years into the deep-learning revolution, neural nets and their pattern-recognizing abilities have colonized every nook of daily life. They help Gmail autocomplete your sentences, help banks detect fraud, let photo apps automatically recognize faces, and—in the case of OpenAI’s GPT-3 and DeepMind’s Gopher—write long, human-sounding essays and summarize texts. They’re even changing how science is done; in 2020, DeepMind debuted AlphaFold2, an AI that can predict how proteins will fold—a superhuman skill that can help guide researchers to develop new drugs and treatments.

Meanwhile Deep Blue vanished, leaving no useful inventions in its wake. Chess playing, it turns out, wasn’t a computer skill that was needed in everyday life. “What Deep Blue in the end showed was the shortcomings of trying to handcraft everything,” says DeepMind founder Hassabis.

IBM tried to remedy the situation with Watson, another specialized system, this one designed to tackle a more practical problem: getting a machine to answer questions. It used statistical analysis of massive amounts of text to achieve language comprehension that was, for its time, cutting-edge. It was more than a simple if-then system. But Watson faced unlucky timing: it was eclipsed only a few years later by the revolution in deep learning, which brought in a generation of language-crunching models far more nuanced than Watson’s statistical techniques.

Deep learning has run roughshod over old-school AI precisely because “pattern recognition is incredibly powerful,” says Daphne Koller, a former Stanford professor who founded and runs Insitro, which uses neural nets and other forms of machine learning to investigate novel drug treatments. The flexibility of neural nets—the wide variety of ways pattern recognition can be used—is the reason there hasn’t yet been another AI winter. “Machine learning has actually delivered value,” she says, which is something the “previous waves of exuberance” in AI never did.

The inverted fortunes of Deep Blue and neural nets show how bad we were, for so long, at judging what’s hard—and what’s valuable—in AI.

For decades, people assumed mastering chess would be important because, well, chess is hard for humans to play at a high level. But chess turned out to be fairly easy for computers to master, because it’s so logical.

What was far harder for computers to learn was the casual, unconscious mental work that humans do—like conducting a lively conversation, piloting a car through traffic, or reading the emotional state of a friend. We do these things so effortlessly that we rarely realize how tricky they are, and how much fuzzy, grayscale judgment they require. Deep learning’s great utility has come from being able to capture small bits of this subtle, unheralded human intelligence.

Still, there’s no final victory in artificial intelligence. Deep learning may be riding high now—but it’s amassing sharp critiques, too.

“For a very long time, there was this techno-chauvinist enthusiasm that okay, AI is going to solve every problem!” says Meredith Broussard, a programmer turned journalism professor at New York University and author of Artificial Unintelligence. But as she and other critics have pointed out, deep-learning systems are often trained on biased data—and absorb those biases. The computer scientists Joy Buolamwini and Timnit Gebru discovered that three commercially available visual AI systems were terrible at analyzing the faces of darker-skinned women. Amazon trained an AI to vet résumés, only to find it downranked women.

Though computer scientists and many AI engineers are now aware of these bias problems, they’re not always sure how to deal with them. On top of that, neural nets are also “massive black boxes,” says Daniela Rus, a veteran of AI who currently runs MIT’s Computer Science and Artificial Intelligence Laboratory. Once a neural net is trained, its mechanics are not easily understood even by its creator. It is not clear how it comes to its conclusions—or how it will fail.

“For a very long time, there was this techno-chauvinist enthusiasm that Okay, AI is going to solve every problem!”

It may not be a problem, Rus figures, to rely on a black box for a task that isn’t “safety critical.” But what about a higher-stakes job, like autonomous driving? “It’s actually quite remarkable that we could put so much trust and faith in them,” she says.

This is where Deep Blue had an advantage. The old-school style of handcrafted rules may have been brittle, but it was comprehensible. The machine was complex—but it wasn’t a mystery.

Ironically, that old style of programming might stage something of a comeback as engineers and computer scientists grapple with the limits of pattern matching.

Language generators, like OpenAI’s GPT-3 or DeepMind’s Gopher, can take a few sentences you’ve written and keep on going, writing pages and pages of plausible-sounding prose. But despite some impressive mimicry, Gopher “still doesn’t really understand what it’s saying,” Hassabis says. “Not in a true sense.”

Similarly, visual AI can make terrible mistakes when it encounters an edge case. Self-driving cars have slammed into fire trucks parked on highways, because in all the millions of hours of video they’d been trained on, they’d never encountered that situation. Neural nets have, in their own way, a version of the “brittleness” problem.

What AI really needs in order to move forward, as many computer scientists now suspect, is the ability to know facts about the world—and to reason about them. A self-driving car cannot rely only on pattern matching. It also has to have common sense—to know what a fire truck is, and why seeing one parked on a highway would signify danger.

The problem is, no one knows quite how to build neural nets that can reason or use common sense. Gary Marcus, a cognitive scientist and coauthor of Rebooting AI, suspects that the future of AI will require a “hybrid” approach—neural nets to learn patterns, but guided by some old-fashioned, hand-coded logic. This would, in a sense, merge the benefits of Deep Blue with the benefits of deep learning.

Hard-core aficionados of deep learning disagree. Hinton believes neural networks should, in the long run, be perfectly capable of reasoning. After all, humans do it, “and the brain’s a neural network.” Using hand-coded logic strikes him as bonkers; it’d run into the problem of all expert systems, which is that you can never anticipate all the common sense you’d want to give to a machine. The way forward, Hinton says, is to keep innovating on neural nets—to explore new architectures and new learning algorithms that more accurately mimic how the human brain itself works.

Computer scientists are dabbling in a variety of approaches. At IBM, Deep Blue developer Campbell is working on “neuro-symbolic” AI that works a bit the way Marcus proposes. Etzioni’s lab is attempting to build common-sense modules for AI that include both trained neural nets and traditional computer logic; as yet, though, it’s early days. The future may look less like an absolute victory for either Deep Blue or neural nets, and more like a Frankensteinian approach—the two stitched together.

Given that AI is likely here to stay, how will we humans live with it? Will we ultimately be defeated, like Kasparov with Deep Blue, by AIs so much better at “thinking work” that we can’t compete?

Kasparov himself doesn’t think so. Not long after his loss to Deep Blue, he decided that fighting against an AI made no sense. The machine “thought” in a fundamentally inhuman fashion, using brute-force math. It would always have better tactical, short-term power.

So why compete? Instead, why not collaborate?

After the Deep Blue match, Kasparov invented “advanced chess,” where humans and silicon work together. A human plays against another human—but each also wields a laptop running chess software, to help war-game possible moves.

When Kasparov began running advanced chess matches in 1998, he quickly discovered fascinating differences in the game. Interestingly, amateurs punched above their weight. In one human-with-laptop match in 2005, a pair of them won the top prize—beating out several grand masters.

How could they best superior chess minds? Because the amateurs better understood how to collaborate with the machine. They knew how to rapidly explore ideas, when to accept a machine suggestion and when to ignore it. (Some leagues still hold advanced chess tournaments today.)

This, Kasparov argues, is precisely how we ought to approach the emerging world of neural nets.

“The future,” he told me in an email, lies in “finding ways to combine human and machine intelligences to reach new heights, and to do things neither could do alone.”

Neural nets behave differently from chess engines, of course. But many luminaries agree strongly with Kasparov’s vision of human-AI collaboration. DeepMind’s Hassabis sees AI as a way forward for science, one that will guide humans toward new breakthroughs.

“I think we’re going to see a huge flourishing,” he says, “where we will start seeing Nobel Prize–winning–level challenges in science being knocked down one after the other.” Koller’s firm Insitro is similarly using AI as a collaborative tool for researchers. “We’re playing a hybrid human-machine game,” she says.

Will there come a time when we can build AI so human-like in its reasoning that humans really do have less to offer—and AI takes over all thinking? Possibly. But even these scientists, on the cutting edge, can’t predict when that will happen, if ever.

So consider this Deep Blue’s final gift, 25 years after its famous match. In his defeat, Kasparov spied the real endgame for AI and humans. “We will increasingly become managers of algorithms,” he told me, “and use them to boost our creative output—our adventuresome souls.”

Clive Thompson is a science and technology journalist based in New York City and author of Coders: The Making of a New Tribe and the Remaking of the World.