How OpenAI is trying to make ChatGPT safer and less biased

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Have you been threatened by an AI chatbot yet? Over the past week it seems like almost every news outlet has tried Microsoft’s Bing AI search and found that the chatbot makes up stupid and creepy stuff. It repeatedly told a New York Times tech columnist that it “loved” him, then claimed to be “offended” by a line of questioning in a mock interview with The Washington Post. In response, Microsoft has limited Bing to five replies per session in an effort to reduce the chances it goes off-piste.

It’s not just freaking out journalists (some of whom should really know better than to anthropomorphize and hype up a dumb chatbot’s ability to have feelings.) The startup has also gotten a lot of heat from conservatives in the US who claim its chatbot ChatGPT has a “woke” bias.

All this outrage is finally having an impact. Bing’s trippy content is generated by AI language technology called ChatGPT developed by startup OpenAI, and last Friday, OpenAI issued a blog post aimed at clarifying how its chatbots should behave. It also released its guidelines on how ChatGPT should respond when prompted with things about US “culture wars.” The rules include not affiliating with political parties or judging one group as good or bad, for example.

I spoke to Sandhini Agarwal and Lama Ahmad, two AI policy researchers at OpenAI, about how the company is making ChatGPT safer and less nuts. The company refused to comment on its relationship with Microsoft, but they still had some interesting insights. Here’s what they had to say:

How to get better answers: In AI language model research, one of the biggest open questions is how to stop the models “hallucinating,” a polite term for making stuff up. ChatGPT has been used by millions of people for months, but we haven’t seen the kind of falsehoods and hallucinations that Bing has been generating.

That’s because OpenAI has used a technique in ChatGPT called reinforcement learning from human feedback, which improves the model’s answers based on feedback from users. The technique works by asking people to pick between a range of different outputs before ranking them in terms of various different criteria, like factualness and truthfulness. Some experts believe Microsoft might have skipped or rushed this stage to launch Bing, although the company is yet to confirm or deny that claim.

But that method is not perfect, according to Agarwal. People might have been presented with options that were all false, then picked the option that was the least false, she says. In an effort to make ChatGPT more reliable, the company has been focusing on cleaning up its dataset and removing examples where the model has had a preference for things that are false.

Jailbreaking ChatGPT: Since ChatGPT’s release, people have been trying to “jailbreak” it, which means finding workarounds to prompt the model to break its own rules and generate racist or conspiratory stuff. This work has not gone unnoticed at OpenAI HQ. Agarwal says OpenAI has gone through its entire database and selected the prompts that have led to unwanted content in order to improve the model and stop it from repeating these generations.

OpenAI wants to listen: The company has said it will start gathering more feedback from the public to shape its models. OpenAI is exploring using surveys or setting up citizens assemblies to discuss what content should be completely banned, says Lama Ahmad. “In the context of art, for example, nudity may not be something that’s considered vulgar, but how do you think about that in the context of ChatGPT in the classroom,” she says.

Consensus project: OpenAI has traditionally used human feedback from data labellers, but recognizes that the people it hires to do that work are not representative of the wider world, says Agarwal. The company wants to expand the viewpoints and the perspectives that are represented in these models. To that end, it’s working on a more experimental project dubbed the “consensus project,” where OpenAI researchers are looking at the extent to which people agree or disagree across different things the AI model has generated. People might feel more strongly about answers to questions such as “are taxes good” versus “is the sky blue,” for example, Agarwal says.

A customized chatbot is coming: Ultimately, OpenAI believes it might be able to train AI models to represent different perspectives and worldviews. So instead of a one-size-fits-all ChatGPT, people might be able to use it to generate answers that align with their own politics. “That’s where we’re aspiring to go to, but it’s going to be a long, difficult journey to get there because we realize how challenging this domain is,” says Agarwal.

Here’s my two cents: It’s a good sign that OpenAI is planning to invite public participation in determining where ChatGPT’s red lines might be. A bunch of engineers in San Francisco can’t, and frankly shouldn’t, determine what is acceptable for a tool used by millions of people around the world in very different cultures and political contexts. I’ll be very interested in seeing how far they will be willing to take this political customization. Will OpenAI be okay with a chatbot that generates content that represents extreme political ideologies? Meta has faced harsh criticism after allowing the incitement of genocide in Myanmar on its platform, and increasingly, OpenAI is dabbling in the same murky pond. Sooner or later, it’s going to realize how enormously complex and messy the world of content moderation is.

Deeper Learning

AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.

Hundreds of startups are exploring the use of machine learning in the pharmaceutical industry. The first drugs designed with the help of AI are now in clinical trials, the rigorous tests done on human volunteers to see if a treatment is safe—and really works—before regulators clear them for widespread use.

Why this matters: Today, on average, it takes more than 10 years and billions of dollars to develop a new drug. The vision is to use AI to make drug discovery faster and cheaper. By predicting how potential drugs might behave in the body and discarding dead-end compounds before they leave the computer, machine-learning models can cut down on the need for painstaking lab work. Read more from Will Douglas Heaven here.

Bits and Bytes

The ChatGPT-fueled battle for search is bigger than Microsoft or Google
It’s not just Big Tech that’s trying to make AI-powered search happen. Will Douglas Heaven looks at a slew of startups trying to reshape search—for better or worse. (MIT Technology Review)

A new tool could help artists protect their work from AI art generators
Artists have been criticizing image making AI systems for stealing their work. Researchers at the University of Chicago have developed a tool called Glaze that adds a sort of cloak to images that will stop AI models from learning a particular artist’s style. This cloak will look invisible to the human eye, but it will distort the way AI models pick up the image. (The New York Times)

A new African startup wants to build a research lab to lure back talent
This is cool. South African AI research startup Lelapa wants to convince Africans working in tech jobs overseas to quit and move back home to work on problems that serve African businesses and communities. (Wired)

An elite law firm is going to use AI chatbots to draft documents
British law firm Allen and Overy has announced it is going to use an AI chatbot called Harvey to help its lawyers draft contracts. Harvey was built using the same tech as OpenAI’s ChatGPT. The firm’s lawyers have been warned that they need to fact check any information Harvey generates. Let’s hope they listen, or this could get messy. (The Financial Times)

Inside the ChatGPT race in China
In the last week, almost every major Chinese tech company has announced plans to introduce their own ChatGPT-like products, reports my colleague Zeyi Yang in his newsletter about Chinese tech. But a Chinese ChatGPT alternative won’t pop up overnight—even though many companies may want you to think so. (MIT Technology Review)