How to spot AI-generated text

This sentence was written by an AI—or was it? OpenAI’s new chatbot, ChatGPT, presents us with a problem: How will we know whether what we read online is written by a human or a machine?

Since it was released in late November, ChatGPT has been used by over a million people. It has the AI community enthralled, and it is clear the internet is increasingly being flooded with AI-generated text. People are using it to come up with jokes, write children’s stories, and craft better emails.

ChatGPT is OpenAI’s spin-off of its large language model GPT-3, which generates remarkably human-sounding answers to questions that it’s asked. The magic—and danger—of these large language models lies in the illusion of correctness. The sentences they produce look right—they use the right kinds of words in the correct order. But the AI doesn’t know what any of it means. These models work by predicting the most likely next word in a sentence. They haven’t a clue whether something is correct or false, and they confidently present information as true even when it is not.

In an already polarized, politically fraught online world, these AI tools could further distort the information we consume. If they are rolled out into the real world in real products, the consequences could be devastating.

We’re in desperate need of ways to differentiate between human- and AI-written text in order to counter potential misuses of the technology, says Irene Solaiman, policy director at AI startup Hugging Face, who used to be an AI researcher at OpenAI and studied AI output detection for the release of GPT-3’s predecessor GPT-2.

New tools will also be crucial to enforcing bans on AI-generated text and code, like the one recently announced by Stack Overflow, a website where coders can ask for help. ChatGPT can confidently regurgitate answers to software problems, but it’s not foolproof. Getting code wrong can lead to buggy and broken software, which is expensive and potentially chaotic to fix.

A spokesperson for Stack Overflow says that the company’s moderators are “examining thousands of submitted community member reports via a number of tools including heuristics and detection models” but would not go into more detail.

In reality, it is incredibly difficult, and the ban is likely almost impossible to enforce.

Today’s detection tool kit

There are various ways researchers have tried to detect AI-generated text. One common method is to use software to analyze different features of the text—for example, how fluently it reads, how frequently certain words appear, or whether there are patterns in punctuation or sentence length.

“If you have enough text, a really easy cue is the word ‘the’ occurs too many times,” says Daphne Ippolito, a senior research scientist at Google Brain, the company’s research unit for deep learning.

Because large language models work by predicting the next word in a sentence, they are more likely to use common words like “the,” “it,” or “is” instead of wonky, rare words. This is exactly the kind of text that automated detector systems are good at picking up, Ippolito and a team of researchers at Google found in research they published in 2019.

But Ippolito’s study also showed something interesting: the human participants tended to think this kind of “clean” text looked better and contained fewer mistakes, and thus that it must have been written by a person.

In reality, human-written text is riddled with typos and is incredibly variable, incorporating different styles and slang, while “language models very, very rarely make typos. They’re much better at generating perfect texts,” Ippolito says.

“A typo in the text is actually a really good indicator that it was human written,” she adds.

Large language models themselves can also be used to detect AI-generated text. One of the most successful ways to do this is to retrain the model on some texts written by humans, and others created by machines, so it learns to differentiate between the two, says Muhammad Abdul-Mageed, who is the Canada research chair in natural-language processing and machine learning at the University of British Columbia and has studied detection.

Scott Aaronson, a computer scientist at the University of Texas on secondment as a researcher at OpenAI for a year, meanwhile, has been developing watermarks for longer pieces of text generated by models such as GPT-3—“an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT,” he writes in his blog.

A spokesperson for OpenAI confirmed that the company is working on watermarks, and said its policies state that users should clearly indicate text generated by AI “in a way no one could reasonably miss or misunderstand.”

But these technical fixes come with big caveats. Most of them don’t stand a chance against the latest generation of AI language models, as they are built on GPT-2 or other earlier models. Many of these detection tools work best when there is a lot of text available; they will be less efficient in some concrete use cases, like chatbots or email assistants, which rely on shorter conversations and provide less data to analyze. And using large language models for detection also requires powerful computers, and access to the AI model itself, which tech companies don’t allow, Abdul-Mageed says.

The bigger and more powerful the model, the harder it is to build AI models to detect what text is written by a human and what isn’t, says Solaiman.

“What’s so concerning now is that [ChatGPT has] really impressive outputs. Detection models just can’t keep up. You’re playing catch-up this whole time,” she says.

Training the human eye

There is no silver bullet for detecting AI-written text, says Solaiman. “A detection model is not going to be your answer for detecting synthetic text in the same way that a safety filter is not going to be your answer for mitigating biases,” she says.

To have a chance of solving the problem, we’ll need improved technical fixes and more transparency around when humans are interacting with an AI, and people will need to learn to spot the signs of AI-written sentences.

“What would be really nice to have is a plug-in to Chrome or to whatever web browser you’re using that will let you know if any text on your web page is machine generated,” Ippolito says.

Some help is already out there. Researchers at Harvard and IBM developed a tool called Giant Language Model Test Room (GLTR), which supports humans by highlighting passages that might have been generated by a computer program.

But AI is already fooling us. Researchers at Cornell University found that people found fake news articles generated by GPT-2 credible about 66% of the time.

Another study found that untrained humans were able to correctly spot text generated by GPT-3 only at a level consistent with random chance.

The good news is that people can be trained to be better at spotting AI-generated text, Ippolito says. She built a game to test how many sentences a computer can generate before a player catches on that it’s not human, and found that people got gradually better over time.

“If you look at lots of generative texts and you try to figure out what doesn’t make sense about it, you can get better at this task,” she says. One way is to pick up on implausible statements, like the AI saying it takes 60 minutes to make a cup of coffee.

GPT-3, ChatGPT’s predecessor, has only been around since 2020. OpenAI says ChatGPT is a demo, but it is only a matter of time before similarly powerful models are developed and rolled out into products such as chatbots for use in customer service or health care. And that’s the crux of the problem: the speed of development in this sector means that every way to spot AI-generated text becomes outdated very quickly. It’s an arms race—and right now, we’re losing.