Next-gen content farms are using AI-generated text to spin up junk websites
This article is from The Technocrat, MIT Technology Review’s weekly tech policy newsletter about power, politics, and Silicon Valley. To receive it in your inbox every Friday, sign up here.
We’ve heard a lot about AI risks in the era of large language models like ChatGPT (including from me!)—risks such as prolific mis- and disinformation and the erosion of privacy. Back in April, my colleague Melissa Heikkilä also predicted that these new AI models would soon flood the internet with spam and scams. Today’s story explains that this new wave has already arrived, and it’s incentivized by ad money.
People are using AI to quickly spin up junk websites in order to capture some of the programmatic advertising money that’s sloshing around online, according to a new report by NewsGuard, exclusively shared with MIT Technology Review. That means that blue chip advertisers and major brands are essentially funding the next wave of content farms, likely without their knowledge.
NewsGuard, which rates the quality of websites, found over 140 major brands advertising on sites using AI-generated text that it considers “unreliable”, and the ads they found come from some of the most recognized companies in the world. Ninety percent of the ads from major brands were served through Google’s ad technology, despite the company’s own policies that prohibit sites from placing Google-served ads on pages with “spammy automatically generated content.”
The ploy works because programmatic advertising allows companies to buy ad spots on the internet without human oversight: algorithms bid on placements to optimize the number of relevant eyeballs likely to see that ad. Even before generative AI entered the scene, around 21% of ad impressions were taking place on junk “made for advertising” websites, wasting about $13 billion each year.
Now, people are using generative AI to make sites that capture ad dollars. NewsGuard has tracked over 200 “unreliable AI-generated news and information sites” since April 2023, and most seem to be seeking to profit off advertising money from, often, reputable companies.
NewsGuard identifies these websites by using AI to check whether they contain text that matches the standard error messages from large language models like ChatGPT. Those flagged are then reviewed by human researchers.
Most of the websites’ creators are completely anonymous, and some sites even feature fake, AI-generated creator bios and photos.
As Lorenzo Arvanitis, a researcher at NewsGuard, told me, “This is just kind of the name of the game on the internet.” Often, perfectly well-meaning companies end up paying for junk—and sometimes inaccurate, misleading, or fake—content because they are so keen to compete for online user attention. (There’s been some good stuff written about this before.)
The big story here is that generative AI is being used to supercharge this whole ploy, and it’s likely that this phenomenon is “going to become even more pervasive as these language models become more advanced and accessible,” according to Arvanitis.
And though we can expect it to be used by malign actors in disinformation campaigns, we shouldn’t overlook the less dramatic but perhaps more likely consequence of generative AI: huge amounts of wasted money and resources.
What else I’m reading
- Chuck Schumer, the Senate majority leader in the US Congress, unveiled a plan for AI regulation in a speech last Wednesday, saying that innovation ought to be the “North Star” in legislation. President Biden also met with some AI experts in San Francisco last week, in another signal that regulatory action could be around the corner, but I’m not holding my breath.
- Political campaigns are using generative AI, setting off alarm bells about disinformation, according to this great overview from the New York Times. “Political experts worry that artificial intelligence, when misused, could have a corrosive effect on the democratic process,” reporters Tiffany Hsu and Steven Lee Myers write.
- Last week, Meta’s oversight board issued binding recommendations about how the company moderates content around war. The company will have to provide additional information about why material is left up or taken down, and preserve anything that documents human rights abuses. Meta has to share that documentation with authorities, when appropriate as well. Alexa Koenig, the executive director of the Human Rights Center, wrote a sharp analysis for Tech Policy Press explaining why this is actually a pretty big deal.
What I learned this week
The science about the relationship between social media and mental health for teens is still pretty complicated. A few weeks ago, Kaitlyn Tiffany at the Atlantic wrote a really in-depth feature, surveying the existing, and sometimes conflicting, research in the field. Teens are indeed experiencing a sharp increase in mental-health issues in the United States, and social media is often considered a contributing factor to the crisis.
The science, however, is not as clear or illuminating as we might hope, and just exactly how and when social media is damaging is not yet well established in the research. Tiffany writes that “a decade of work and hundreds of studies have produced a mixture of results, in part because they’ve used a mixture of methods and in part because they’re trying to get at something elusive and complicated.” Importantly, “social media’s effects seem to depend a lot on the person using it.”