AI lie detectors are better than humans at spotting lies

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here. 

Can you spot a liar? It’s a question I imagine has been on a lot of minds lately, in the wake of various televised political debates. Research has shown that we’re generally pretty bad at telling a truth from a lie.
 
Some believe that AI could help improve our odds, and do better than dodgy old fashioned techniques like polygraph tests. AI-based lie detection systems could one day be used to help us sift fact from fake news, evaluate claims, and potentially even spot fibs and exaggerations in job applications. The question is whether we will trust them. And if we should.

In a recent study, Alicia von Schenk and her colleagues developed a tool that was significantly better than people at spotting lies. Von Schenk, an economist at the University of Würzburg in Germany, and her team then ran some experiments to find out how people used it. In some ways, the tool was helpful—the people who made use of it were better at spotting lies. But they also led people to make a lot more accusations.

In their study published in the journal iScience, von Schenk and her colleagues asked volunteers to write statements about their weekend plans. Half the time, people were incentivized to lie; a believable yet untrue statement was rewarded with a small financial payout. In total, the team collected 1,536 statements from 768 people.
 
They then used 80% of these statements to train an algorithm on lies and truths, using Google’s AI language model BERT. When they tested the resulting tool on the final 20% of statements, they found it could successfully tell whether a statement was true or false 67% of the time. That’s significantly better than a typical human; we usually only get it right around half the time.
 
To find out how people might make use of AI to help them spot lies, von Schenk and her colleagues split 2,040 other volunteers into smaller groups and ran a series of tests.
 
One test revealed that when people are given the option to pay a small fee to use an AI tool that can help them detect lies—and earn financial rewards—they still aren’t all that keen on using it. Only a third of the volunteers given that option decided to use the AI tool, possibly because they’re skeptical of the technology, says von Schenk. (They might also be overly optimistic about their own lie-detection skills, she adds.)
 
But that one-third of people really put their trust in the technology. “When you make the active choice to rely on the technology, we see that people almost always follow the prediction of the AI… they rely very much on its predictions,” says von Schenk.

This reliance can shape our behavior. Normally, people tend to assume others are telling the truth. That was borne out in this study—even though the volunteers knew half of the statements were lies, they only marked out 19% of them as such. But that changed when people chose to make use of the AI tool: the accusation rate rose to 58%.
 
In some ways, this is a good thing—these tools can help us spot more of the lies we come across in our lives, like the misinformation we might come across on social media.
 
But it’s not all good. It could also undermine trust, a fundamental aspect of human behavior that helps us form relationships. If the price of accurate judgements is the deterioration of social bonds, is it worth it?
 
And then there’s the question of accuracy. In their study, von Schenk and her colleagues were only interested in creating a tool that was better than humans at lie detection. That isn’t too difficult, given how terrible we are at it. But she also imagines a tool like hers being used to routinely assess the truthfulness of social media posts, or hunt for fake details in a job hunter’s resume or interview responses. In cases like these, it’s not enough for a technology to just be “better than human” if it’s going to be making more accusations. 
 
Would we be willing to accept an accuracy rate of 80%, where only four out of every five assessed statements would be correctly interpreted as true or false? Would even 99% accuracy suffice? I’m not sure.
 
It’s worth remembering the fallibility of historical lie detection techniques. The polygraph was designed to measure heart rate and other signs of “arousal” because it was thought some signs of stress were unique to liars. They’re not. And we’ve known that for a long time. That’s why lie detector results are generally not admissible in US court cases. Despite that, polygraph lie detector tests have endured in some settings, and have caused plenty of harm when they’ve been used to hurl accusations at people who fail them on reality TV shows.
 
Imperfect AI tools stand to have an even greater impact because they are so easy to scale, says von Schenk. You can only polygraph so many people in a day. The scope for AI lie detection is almost limitless by comparison.
 
“Given that we have so much fake news and disinformation spreading, there is a benefit to these technologies,” says von Schenk. “However, you really need to test them—you need to make sure they are substantially better than humans.” If an AI lie detector is generating a lot of accusations, we might be better off not using it at all, she says.


Now read the rest of The Checkup

Read more from MIT Technology Review’s archive

AI lie detectors have also been developed to look for facial patterns of movement and “microgestures” associated with deception. As Jake Bittle puts it: “the dream of a perfect lie detector just won’t die, especially when glossed over with the sheen of AI.”
 
On the other hand, AI is also being used to generate plenty of disinformation. As of October last year, generative AI was already being used in at least 16 countries to “sow doubt, smear opponents, or influence public debate,” as Tate Ryan-Mosley reported.
 
The way AI language models are developed can heavily influence the way that they work. As a result, these models have picked up different political biases, as my colleague Melissa Heikkilä covered last year.
 
AI, like social media, has the potential for good or ill. In both cases, the regulatory limits we place on these technologies will determine which way the sword falls, argue Nathan E. Sanders and Bruce Schneier.
 
Chatbot answers are all made up.
But there’s a tool that can give a reliability score to large language model outputs, helping users work out how trustworthy they are. Or, as Will Douglas Heaven put it in an article published a few months ago, a BS-o-meter for chatbots.

From around the web

Scientists, ethicists and legal experts in the UK have published a new set of guidelines for research on synthetic embryos, or, as they call them, “stem cell-based embryo models (SCBEMs).” There should be limits on how long they are grown in labs, and they should not be transferred into the uterus of a human or animal, the guideline states. They also note that, if, in future, these structures look like they might have the potential to develop into a fetus, we should stop calling them “models” and instead refer to them as “embryos.”

Antimicrobial resistance is already responsible for 700,000 deaths every year, and could claim 10 million lives per year by 2050. Overuse of broad spectrum antibiotics is partly to blame. Is it time to tax these drugs to limit demand? (International Journal of Industrial Organization)

Spaceflight can alter the human brain, reorganizing gray and white matter and causing the brain to shift upwards in the skull. We need to better understand these effects, and the impact of cosmic radiation on our brains, before we send people to Mars. (The Lancet Neurology)

The vagus nerve has become an unlikely star of social media, thanks to influencers who drum up the benefits of stimulating it. Unfortunately, the science doesn’t stack up. (New Scientist)

A hospital in Texas is set to become the first in the country to enable doctors to see their patients via hologram. Crescent Regional Hospital in Lancaster has installed Holobox—a system that projects a life-sized hologram of a doctor for patient consultations. (ABC News)

Main Menu