An endlessly changing playground teaches AIs how to multitask

DeepMind has developed a vast candy-colored virtual playground that teaches AIs general skills by endlessly changing the tasks it sets them. Instead of developing just the skills needed to solve a particular task, the AIs learn to experiment and explore, picking up skills they then use to succeed in tasks they’ve never seen before. It is a small step toward general intelligence.

What is it? XLand is a video-game-like 3D world that the AI players sense in color. The playground is managed by a central AI that sets the players billions of different tasks by changing the environment, the game rules, and the number of players. Both the players and the playground manager use reinforcement learning to improve by trial and error.

During training, the players first face simple one-player games, such as finding a purple cube or placing a yellow ball on a red floor. They advance to more complex multiplayer games like hide and seek or capture the flag, where teams compete to be the first to find and grab their opponent’s flag. The playground manager has no specific goal but aims to improve the general capability of the players over time.

Why is this cool? AIs like DeepMind’s AlphaZero have beaten the world’s best human players at chess and Go. But they can only learn one game at a time. As DeepMind cofounder Shane Legg put it when I spoke to him last year, it’s like having to swap out your chess brain for your Go brain each time you want to switch games.

Researchers are now trying to build AIs that can learn multiple tasks at once, which means teaching them general skills that make it easier to adapt.

video of AI agents experimenting in a virtual environment — Having learned to experiment, these bots improvised a ramp

One exciting trend in this direction is open-ended learning, where AIs are trained on many different tasks without a specific goal. In many ways, this is how humans and other animals seem to learn, via aimless play. But this requires a vast amount of data. XLand generates that data automatically, in the form of an endless stream of challenges. It is similar to POET, an AI training dojo where two-legged bots learn to navigate obstacles in a 2D landscape. XLand’s world is much more complex and detailed, however.

XLand is also an example of AI learning to make itself, or what Jeff Clune, who helped develop POET and leads a team working on this topic at OpenAI, calls AI-generating algorithms (AI-GAs). “This work pushes the frontiers of AI-GAs,” says Clune. “It is very exciting to see.”

What did they learn? Some of DeepMind’s XLand AIs played 700,000 different games in 4,000 different worlds, encountering 3.4 million unique tasks in total. Instead of learning the best thing to do in each situation, which is what most existing reinforcement-learning AIs do, the players learned to experiment—moving objects around to see what happened, or using one object as a tool to reach another object or hide behind—until they beat the particular task.

In the videos you can see the AIs chucking objects around until they stumble on something useful: a large tile, for example, becomes a ramp up to a platform. It is hard to know for sure if all such outcomes are intentional or happy accidents, say the researchers. But they happen consistently.

AIs that learned to experiment had an advantage in most tasks, even ones that they had not seen before. The researchers found that after just 30 minutes of training on a complex new task, the XLand AIs adapted to it quickly. But AIs that had not spent time in XLand could not learn these tasks at all.