How to fight for internet freedom

This article is from The Technocrat, MIT Technology Review’s weekly tech policy newsletter about power, politics, and Silicon Valley. To receive it in your inbox every Friday, sign up here.

You may not be shocked to hear that governments are using generative AI to manipulate conversations and automatically censor what’s online. But now we have a better sense of how this is happening, when, and where: a new report shows that political actors in 16 countries, including Pakistan, Nigeria, and the United States, have used generative AI over the past year to exert increased control over the internet.

Last week, Freedom House, a human rights advocacy group, released its annual review of the state of internet freedom around the world; it’s one of the most important trackers out there if you want to understand changes to digital free expression.

As I wrote, the report shows that generative AI is already a game changer in geopolitics. But this isn’t the only concerning finding. Globally, internet freedom has never been lower, and the number of countries that have blocked websites for political, social, and religious speech has never been higher. Also, the number of countries that arrested people for online expression reached a record high.

These issues are particularly urgent before we head into a year with over 50 elections worldwide; as Freedom House has noted, election cycles are times when internet freedom is often most under threat. The organization has issued some recommendations for how the international community should respond to the growing crisis, and I also reached out to another policy expert for her perspective.

Call me an optimist, but talking with them this week made me feel like there are at least some actionable things we might do to make the internet safer and freer. Here are three key things they say tech companies and lawmakers should do:

Increase transparency around AI models
One of the primary recommendations from Freedom House is to encourage more public disclosure of how AI models were built. Large language models like ChatGPT are infamously inscrutable (you should read my colleagues’ work on this), and the companies that develop the algorithms have been resistant to disclosing information about what data they used to train their models.

“Government regulation should be aimed at delivering more transparency, providing effective mechanisms of public oversight, and prioritizing the protection of human rights,” the report says.

As governments race to keep up in a rapidly evolving space, comprehensive legislation may be out of reach. But proposals that mandate more narrow requirements—like the disclosure of training data and standardized testing for bias in outputs—could find their way into more targeted policies. (If you’re curious to know more about what the US in particular may do to regulate AI, I’ve covered that, too.)

When it comes to internet freedom, increased transparency would also help people better recognize when they are seeing state-sponsored content online—like in China, where the government requires content created by generative AI models to be favorable to the Communist Party.
Be cautious when using AI to scan and filter content
Social media companies are increasingly using algorithms to moderate what appears on their platforms. While automatic moderation helps thwart disinformation, it also risks hurting online expression.

“While corporations should consider the ways in which their platforms and products are designed, developed, and deployed so as not to exacerbate state-sponsored disinformation campaigns, they must be vigilant to preserve human rights, namely free expression and association online,” says Mallory Knodel, the chief technology officer of the Center for Democracy and Technology.

Additionally, Knodel says that when governments require platforms to scan and filter content, this often leads to algorithms that block even more content than intended.

As part of the solution, Knodel believes tech companies should find ways to “enhance human-in-the-loop features,” in which people have hands-on roles in content moderation, and “rely on user agency to both block and report disinformation.”
Develop ways to better label AI generated content, especially related to elections
Currently, labeling AI generated images, video, and audio is incredibly hard to do. (I’ve written a bit about this in the past, particularly the ways technologists are trying to make progress on the problem.) But there’s no gold standard here, so misleading content, especially around elections, has the potential to do great harm.

Allie Funk, one of the researchers on the Freedom House report, told me about an example in Nigeria of an AI-manipulated audio clip in which presidential candidate Atiku Abubakar and his team could be heard saying they planned to rig the ballots. Nigeria has a history of election-related conflict, and Funk says disinformation like this “really threatens to inflame simmering potential unrest” and create “disastrous impacts.”

AI-manipulated audio is particularly hard to detect. Funk says this example is just one among many that the group chronicled that “speaks to the need for a whole host of different types of labeling.” Even if it can’t be ready in time for next year’s elections, it’s critical that we start to figure it out now.

What else I’m reading

This joint investigation from Wired and the Markup showed that predictive policing software was right less than 1% of time. The findings are damning yet not surprising: policing technology has a long history of being exposed as junk science, especially in forensics.
MIT Technology Review released our first list of climate technology companies to watch, in which we highlight companies pioneering breakthrough research. Read my colleague James Temple’s overview of the list, which makes the case of why we need to pay attention to technologies that have potential to impact our climate crisis.
Companies that own or use generative AI might soon be able to take out insurance policies to mitigate the risk of using AI models—think biased outputs and copyright lawsuits. It’s a fascinating development in the marketplace of generative AI.

What I learned this week

A new paper from Stanford’s Journal of Online Trust and Safety highlights why content moderation in low-resource languages, which are languages without enough digitized training data to build accurate AI systems, is so poor. It also makes an interesting case about where attention should go to improve this. While social media companies ultimately need “access to more training and testing data in those languages,” it argues, a “lower-hanging fruit” could be investing in local and grassroots initiatives for research on natural-language processing (NLP) in low-resource languages.

“Funders can help support existing local collectives of language- and language-family-specific NLP research networks who are working to digitize and build tools for some of the lowest-resource languages,” the researchers write. In other words, rather than investing in collecting more data from low-resource languages for big Western tech companies, funders should spend money in local NLP projects that are developing new AI research, which could create AI well suited for those languages directly.