Self-driving car startup Wayve can now interrogate its vehicles, asking them questions about their driving decisions—and getting answers back. The idea is to use the same tech behind ChatGPT to help train driverless cars.
The company combined its existing self-driving software with a large language model, creating a hybrid model it calls LINGO-1. LINGO-1 synchs up video data and driving data (the actions that the cars take second by second) with natural-language descriptions that capture what the car sees and what it does.
The UK-based firm has had a string of breakthroughs in the last few years. In 2021 it showed that it could take AI trained on the streets of London and use it to drive cars in four other cities across the UK, a challenge that typically requires significant reengineering. Last year it used that same AI to drive more than one kind of vehicle, another industry first. And now it can chat to its cars.
In a demo the company gave me this week, CEO Alex Kendall played footage taken from the camera on one of its Jaguar I-PACE vehicles, jumped to a random spot in the video, and started typing questions: “What’s the weather like?” The weather is cloudy. “What hazards do you see?” There is a school on the left. “Why did you stop?” Because the traffic light is red.
“We saw some remarkable things come up in the last couple of weeks,” said Kendall. “I never would have thought to ask something like this, but look—” He typed: “How many stories is the building on the right?” Three stories.
“Look at that!” he said, sounding like a proud dad. “We never trained it to do that. It’s really amazed us. We see this as a breakthrough in AI safety.”
“I’m impressed with LINGO-1’s capabilities,” says Pieter Abbeel, a robotics researcher at the University of California, Berkeley, and cofounder of the robotics company Covariant, who has played with a demo of the tech. Abbeel asked LINGO-1 what-if questions like “What would you do if the light were green?” “Almost every time it gave a very precise answer,” he says.
By quizzing the self-driving software every step of the way, Wayve hopes to understand exactly why and how its cars make certain decisions. Most of the time the cars drive fine. When they don’t, it’s a problem—as industry frontrunners like Cruise and Waymo have found.
Both those firms have rolled out small fleets of robotaxis on the streets of a few US cities. But the technology is far from perfect. Cruise and Waymo’s cars have been involved in multiple minor collisions (Waymo is reported to have killed a dog) and block traffic when they get stuck. San Francisco officials have claimed that in August two Cruise vehicles got in the way of an ambulance carrying an injured person, who later died in hospital. Cruise denies the officials’ account.
Wayve hopes that asking its own cars to explain themselves when they do something wrong will uncover flaws faster than poring over video playbacks or scrolling through error reports alone.
“The most important challenge in self-driving is safety,” says Abbeel. “With a system like LINGO-1, I think you get a much better idea of how well it understands driving in the world.” This makes it easier to identify the weak spots, he says.
The next step is to use language to teach the cars, says Kendall. To train LINGO-1, Wayve got its team of expert drivers—some of them former driving instructors—to talk out loud while driving, explaining what they were doing and why: why they sped up, why they slowed down, what hazards they were aware of. The company uses this data to fine-tune the model, giving it driving tips much as an instructor might coach a human learner. Telling a car how to do something rather than just showing it speeds up the training a lot, says Kendall.
Wayve is not the first to use large language models in robotics. Other companies, including Google and Abbeel’s firm Covariant, are using natural language to quiz or instruct domestic or industrial robots. The hybrid tech even has a name: visual-language-action models (VLAMs). But Wayve is the first to use VLAMs for self-driving.
“People often say an image is worth a thousand words, but in machine learning it’s the opposite,” says Kendall. “A few words can be worth a thousand images.” An image contains a lot of data that’s redundant. “When you’re driving, you don’t care about the sky, or the color of the car in front, or stuff like this,” he says. “Words can focus on the information that matters.”
“Wayve’s approach is definitely interesting and unique,” says Lerrel Pinto, a robotics researcher at New York University. In particular, he likes the way LINGO-1 explains its actions.
But he’s curious about what happens when the model makes stuff up. “I don’t trust large language models to be factual,” he says. “I’m not sure if I can trust them to run my car.”
Upol Ehsan, a researcher at the Georgia Institute of Technology who works on ways to get AI to explain its decision-making to humans, has similar reservations. “Large language models are, to use the technical phrase, great bullshitters,” says Ehsan. “We need to apply a bright yellow ‘caution’ tape and make sure the language generated isn’t hallucinated.”
Wayve is well aware of these limitations and is working to make LINGO-1 as accurate as possible. “We see the same challenges that you see in any large language model,” says Kendall. “It’s certainly not perfect.”
One advantage LINGO-1 has over non-hybrid models is that its responses are grounded by the accompanying video data. In theory, this should make LINGO-1 more truthful.
This is about more than just cars, says Kendall. “There’s a reason why you and I have evolved language: it’s the most efficient way that we know of to communicate complex topics. And I think the same will be true with intelligent machines. The way that we’ll interact with robots in the future will be through language.”
Abbeel agrees. “Zooming out, I think we are about to see a revolution in robotics,” he says.