Google DeepMind’s game-playing AI just found another way to make code faster

DeepMind’s run of discoveries in fundamental computer science continues. Last year the company used a version of its game-playing AI AlphaZero to find new ways to speed up the calculation of a crucial piece of math at the heart of many different kinds of code, beating a 50-year-old record.

Now it has pulled the same trick again—twice. Using a new version of AlphaZero called AlphaDev, the UK-based firm (recently renamed Google DeepMind after a merge with its sister company’s AI lab in April) has discovered a way to sort items in a list up to 70% faster than the best existing method.

It has also found a way to speed up a key algorithm used in cryptography by 30%. These algorithms are among the most common building blocks in software. Small speed-ups can make a huge difference, cutting costs and saving energy.

“Moore’s Law is coming to an end, where chips are approaching their fundamental physical limits,” says Daniel Mankowitz, a research scientist at Google DeepMind. “We need to find new and innovative ways of optimizing computing.”

“It’s an interesting new approach,” says Peter Sanders, who studies the design and implementation of efficient algorithms at the Karlsruhe Institute of Technology in Germany and who was not involved in the work. “Sorting is still one of the most widely used subroutines in computing,” he says.

DeepMind published its results in Nature today. But the techniques that AlphaDev discovered are already being used by millions of software developers. In January 2022, DeepMind submitted its new sorting algorithms to the organization that manages C++, one of the most popular programming languages in the world, and after two months of rigorous independent vetting, AlphaDev’s algorithms were added to the language. This was the first change to C++’s sorting algorithms in more than a decade and the first update ever to involve an algorithm discovered using AI.

DeepMind added its other new algorithms to Abseil, an open-source collection of prewritten C++ algorithms that can be used by anybody coding with C++. These cryptography algorithms compute numbers called hashes that can be used as unique IDs for any kind of data. DeepMind estimates that its new algorithms are now being used trillions of times a day.

AlphaDev is built on top of AlphaZero, the reinforcement-learning model that DeepMind trained to master games such as Go and chess. The company’s breakthrough was to treat the problem of finding a faster algorithm as a game and then train its AI to win it—the same approach it used last year to speed up matrix multiplications.

In AlphaDev’s case, the game involves choosing computer instructions and placing them in order so that the resulting lines of code make up an algorithm. AlphaDev wins the game if the algorithm is both correct and faster than existing ones. It sounds simple, but to play well, AlphaDev must search through an astronomical number of possible moves.  

DeepMind chose to work with assembly, a programming language that can be used to give specific instructions for how to move numbers around on a computer chip. Few humans write in assembly; it is the language that code written in languages like C++ gets translated into before it is run. The advantage of assembly is that it allows algorithms to be broken down into fine-grained steps—a good starting point if you’re looking for shortcuts.

Computer chips have different slots where numbers get put and processed. Assembly includes basic instructions for manipulating what’s in these slots, like mov(A,B), which tells a computer to move the number that’s in slot A to slot B, and cmp(A,B), which tells the computer to check if what’s in slot A is less than, equal to, or greater than what’s in slot B. Long sequences of such instructions can carry out everything that computers do.

AlphaDev plays a move in the game by adding a new assembly instruction to the algorithm it is building. To start, AlphaDev would add instructions at random, generating algorithms that would not run. Over time, just as AlphaZero did with board games, it learned to play winning moves. It added instructions that led to algorithms that not only ran, but were correct and fast.

DeepMind focused on algorithms for sorting short lists of three to five items. Such algorithms get called over and over again in programs that sort longer lists. Speed-ups in these short algorithms will therefore have a cumulative knock-on effect.

But short algorithms have also been studied and optimized by humans for decades. Mankowitz and his colleagues started with an algorithm for sorting a list of three items just as a proof of concept. The best human-devised version of this algorithm involves 18 instructions. They didn’t believe they’d be able to improve on it.

“We honestly didn’t expect to achieve anything better,” says Mankowitz. “But to our surprise, we managed to make it faster. We initially thought this was a mistake or a bug or something, but when we analyzed the program we realized that AlphaDev had actually discovered something.”

AlphaDev found a way to sort a list of three items in 17 instructions instead of 18. What it had discovered was that certain steps could be skipped. “When we looked at it afterwards, we were like, ‘Wow, that definitely makes sense,’” says Mankowitz. “But to discover something like this [without AlphaDev], it requires people that are experts in assembly language.”

AlphaDev could not beat the best human version of the algorithm for sorting a list of four items, which takes 28 instructions. But it beat the best human version for five items, cutting the number of instructions down from 46 to 42. 

That amounts to a significant speed-up. The existing C++ algorithm for sorting a list of five items took around 6.91 nanoseconds on a typical Intel Skylake chip. AlphaDev’s took 2.01 nanoseconds, around 70% faster. 

DeepMind compares AlphaDev’s discovery to one of AlphaGo’s weird but winning moves in its Go match against grandmaster Lee Sedol in 2016. “All the experts looked at this move and said, ‘This isn’t the right thing to do. This is a poor move,’” says Mankowitz. “But actually it was the right move, and AlphaGo ended up not just winning the game but also influencing the strategies that professional Go players started using.”

Sanders is impressed, but he does not think the results should be oversold. “I agree that machine-learning techniques are increasingly a game-changer in programming, and everybody is expecting that AIs will soon be able to invent new, better algorithms,” he says. “But we are not quite there yet.”

For one thing, Sanders points out that AlphaDev only uses a subset of the instructions available in assembly. Many existing sorting algorithms use instructions that AlphaDev did not try, he says. This makes it harder to compare AlphaDev with the best rival approaches.

It’s true that AlphaDev has its limits. The longest algorithm it produced was 130 instructions long, for sorting a list of up to five items. At each step, AlphaDev picked from 297 possible assembly instructions (out of many more). “Beyond 297 instructions and assembly games of more than 130 instructions long, learning became slow,” says Mankowitz.

That’s because even with 297 instructions (or game moves), the number of possible algorithms AlphaDev could construct is larger than the possible number of games in chess (10120) and the number of atoms in the universe (which is believed to be around 1080).

For longer algorithms, the team plans to adapt AlphaDev to work with C++ instructions instead of assembly. With less fine-grained control AlphaDev might miss certain shortcuts, but the approach would be applicable to a wider range of algorithms.

Sanders would also like to see a more exhaustive comparison with the best human-devised approaches, especially for longer algorithms. DeepMind says that’s part of its plan. Mankowitz wants to combine AlphaDev with the best human-devised methods, getting the AI to build on human intuition rather than starting from scratch.

After all, there may be more speed-ups to be found. “For a human to do this, it requires significant expertise and a huge amount of hours—maybe days, maybe weeks—to look through these programs and identify improvements,” says Mankowitz. “As a result, it hasn’t been attempted before.”

Main Menu