AI “godfather” Yoshua Bengio has joined a UK project to prevent AI catastrophes
Yoshua Bengio, a Turing Award winner who is considered one of the “godfathers” of modern AI, is throwing his weight behind a project funded by the UK government to embed safety mechanisms into AI systems.
The project, called Safeguarded AI, aims to build an AI system that can check whether other AI systems deployed in critical areas are safe. Bengio is joining the program as scientific director and will provide critical input and scientific advice. The project, which will receive £59 million over the next four years, is being funded by the UK’s Advanced Research and Invention Agency (ARIA), which was launched in January last year to invest in potentially transformational scientific research.
Safeguarded AI’s goal is to build AI systems that can offer quantitative guarantees, such as a risk score, about their effect on the real world, says David “davidad” Dalrymple, the program director for Safeguarded AI at ARIA. The idea is to supplement human testing with mathematical analysis of new systems’ potential for harm.
The project aims to build AI safety mechanisms by combining scientific world models, which are essentially simulations of the world, with mathematical proofs. These proofs would include explanations of the AI’s work, and humans would be tasked with verifying whether the AI model’s safety checks are correct.
Bengio says he wants to help ensure that future AI systems cannot cause serious harm.
“We’re currently racing toward a fog behind which might be a precipice,” he says. “We don’t know how far the precipice is, or if there even is one, so it might be years, decades, and we don’t know how serious it could be … We need to build up the tools to clear that fog and make sure we don’t cross into a precipice if there is one.”
Science and technology companies don’t have a way to give mathematical guarantees that AI systems are going to behave as programmed, he adds. This unreliability, he says, could lead to catastrophic outcomes.
Dalrymple and Bengio argue that current techniques to mitigate the risk of advanced AI systems—such as red-teaming, where people probe AI systems for flaws—have serious limitations and can’t be relied on to ensure that critical systems don’t go off-piste.
Instead, they hope the program will provide new ways to secure AI systems that rely less on human efforts and more on mathematical certainty. The vision is to build a “gatekeeper” AI, which is tasked with understanding and reducing the safety risks of other AI agents. This gatekeeper would ensure that AI agents functioning in high-stakes sectors, such as transport or energy systems, operate as we want them to. The idea is to collaborate with companies early on to understand how AI safety mechanisms could be useful for different sectors, says Dalrymple.
The complexity of advanced systems means we have no choice but to use AI to safeguard AI, argues Bengio. “That’s the only way, because at some point these AIs are just too complicated. Even the ones that we have now, we can’t really break down their answers into human, understandable sequences of reasoning steps,” he says.
The next step—actually building models that can check other AI systems—is also where Safeguarded AI and ARIA hope to change the status quo of the AI industry.
ARIA is also offering funding to people or organizations in high-risk sectors such as transport, telecommunications, supply chains, and medical research to help them build applications that might benefit from AI safety mechanisms. ARIA is offering applicants a total of £5.4 million in the first year, and another £8.2 million in another year. The deadline for applications is October 2.
The agency is also casting a wide net for people who might be interested in building Safeguarded AI’s safety mechanism through a nonprofit organization. ARIA is eyeing up to £18 million to set this organization up and will be accepting funding applications early next year.
The program is looking for proposals to start a nonprofit with a diverse board that encompasses lots of different sectors in order to do this work in a reliable, trustworthy way, Dalrymple says. This is similar to what OpenAI was initially set up to do before changing its strategy to be more product- and profit-oriented.
The organization’s board will not just be responsible for holding the CEO accountable; it will even weigh in on decisions about whether to undertake certain research projects, and whether to release particular papers and APIs, he adds.
The Safeguarded AI project is part of the UK’s mission to position itself as a pioneer in AI safety. In November 2023, the country hosted the very first AI Safety Summit, which gathered world leaders and technologists to discuss how to build the technology in a safe way.
While the funding program has a preference for UK-based applicants, ARIA is looking for global talent that might be interested in coming to the UK, says Dalrymple. ARIA also has an intellectual-property mechanism for funding for-profit companies abroad, which allows royalties to return back to the country.
Bengio says he was drawn to the project to promote international collaboration on AI safety. He chairs the International Scientific Report on the safety of advanced AI, which involves 30 countries as well as the EU and UN. A vocal advocate for AI safety, he has been part of an influential lobby warning that superintelligent AI poses an existential risk.
“We need to bring the discussion of how we are going to address the risks of AI to a global, larger set of actors,” says Bengio. “This program is bringing us closer to this.”