Why watermarking AI-generated content won’t guarantee trust online

In late May, the Pentagon appeared to be on fire.

A few miles away, White House aides and reporters scrambled to figure out whether a viral online image of the exploding building was in fact real.

It wasn’t. It was AI-generated. Yet government officials, journalists, and tech companies were unable to take action before the image had real impact. It not only caused confusion but led to a dip in financial markets.

Manipulated and misleading content is not a new phenomenon. But AI enables increasingly accessible, sophisticated, and hyperrealistic content creation that—while it can be used for good, in artistic expression or accessibility improvements—can also be abused to cast doubt on political events, or to defame, harass, and exploit.

Whether to promote election integrity, protect evidence, reduce misinformation, or preserve historical records, audiences could benefit from knowing when content has been manipulated or generated with AI. Had the Pentagon image contained signs that it was AI-generated, technology platforms might have been able to take action more quickly; they could have promptly reduced its distribution or perhaps labeled the content so that audiences might have been able to more easily identify it as fake. Confusion, and by extension market movement, might have been avoided.

There’s no question that we need more transparency if we’re going to be able to differentiate between what is real and what is synthetic. Last month, the White House weighed in on how to do this, announcing that seven of the most prominent AI companies have committed to “develop robust technical measures to ensure that users know when content is AI-generated, such as watermarking.”

Disclosure methods like watermarks are a good start. However, they’re complicated to put into practice, and they aren’t a quick fix. It’s unclear whether watermarks would have helped Twitter users recognize the fake image of the Pentagon or, more recently, identify Donald Trump’s voice in an ad campaign as synthetic. Might other methods, such as provenance disclosure and metadata, have more impact? And most important, would merely disclosing that content was AI-generated help audiences differentiate fact from fiction, or mitigate real-world harm?

To begin to answer these questions, we need to clarify what we mean by watermarking and other types of disclosure methods. It needs to be clear what they are, what we can reasonably expect them to do, and what problems remain even after they’re introduced. Although definitional debates can seem pedantic, the broad use of the term “watermark” is currently contributing to confusion and a lack of coordination across the AI sector. Defining what we mean by these different methods is a crucial prerequisite for the AI field to work together and agree on standards for disclosure. Otherwise, people are talking at cross-purposes.

I’ve observed this problem firsthand while leading the nonprofit Partnership on AI (PAI) in its multi-sector work to develop guidelines for responsible synthetic media, with commitment from organizations like OpenAI, Adobe, Witness, Microsoft, the BBC, and others.

On the one hand, watermarking can refer to signals that are visible to end users (for example, the “Getty Images” text emblazoned on the image supplier’s media). However, it can also be used to mean technical signals embedded in content that are imperceptible to the naked eye or ear. Both types of watermarks—described as “direct” and “indirect” disclosure—are vital to get right to ensure transparency. Any conversation about the challenges and opportunities in watermarking, then, must highlight which type of watermarking is being evaluated.

Further complicating matters, watermarking is often used as a “catch-all” term for the general act of providing content disclosures, even though there are many methods. A closer read of the White House commitments describes another method for disclosure known as provenance, which relies on cryptographic signatures, not invisible signals. However, this is often described as watermarking in the popular press. If you find this mish-mash of terms confusing, rest assured you’re not the only one. But clarity matters: the AI sector cannot implement consistent and robust transparency measures if there is not even agreement on how we refer to the different techniques.

I’ve come up with six initial questions that could help us evaluate the usefulness of watermarks and other disclosure methods for AI. These should help make sure different parties are discussing the exact same thing, and that we can evaluate each method in a thorough, consistent manner.

Can the watermark itself be tampered with?

Ironically, the technical signals touted as helpful for gauging where content comes from and how it is manipulated can sometimes be manipulated themselves. While it’s difficult, both invisible and visible watermarks can be removed or altered, rendering them useless for telling us what is and isn’t synthetic. And notably, the ease with which they can be manipulated varies according to what type of content you’re dealing with.

Is the watermark’s durability consistent for different content types?

While invisible watermarking is often promoted as a broad solution for dealing with generative AI, such embedded signals are much more easily manipulated in text than in audiovisual content. That likely explains why the White House’s summary document suggests that watermarking would be applied to all types of AI, but in the full text it’s made clear that companies only committed to disclosures for audiovisual material. AI policymaking must therefore be specific about how disclosure techniques like invisible watermarking vary in their durability and broader technical robustness across different content types. One disclosure solution may be great for images, but useless for text.

Who can detect these invisible signals?

Even if the AI sector agrees to implement invisible watermarks, deeper questions are inevitably going to emerge around who has the capacity to detect these signals and eventually make authoritative claims based on them. Who gets to decide whether content is AI-generated, and perhaps as an extension, whether it is misleading? If everyone can detect watermarks, that might render them susceptible to misuse by bad actors. On the other hand, controlled access to detection of invisible watermarks—especially if it is dictated by large AI companies—might degrade openness and entrench technical gatekeeping. Implementing these sorts of disclosure methods without working out how they’re governed could leave them distrusted and ineffective. And if the techniques are not widely adopted, bad actors might turn to open-source technologies that lack the invisible watermarks to create harmful and misleading content.

Do watermarks preserve privacy?

As key work from Witness, a human rights and technology group, makes clear, any tracing system that travels with a piece of content over time might also introduce privacy issues for those creating the content. The AI sector must ensure that watermarks and other disclosure techniques are designed in a manner that does not include identifying information that might put creators at risk. For example, a human rights defender might capture abuses through photographs that are watermarked with identifying information, making the person an easy target for an authoritarian government. Even the knowledge that watermarks could reveal an activist’s identity might have chilling effects on expression and speech. Policymakers must provide clearer guidance on how disclosures can be designed so as to preserve the privacy of those creating content, while also including enough detail to be useful and practical.

Do visible disclosures help audiences understand the role of generative AI?

Even if invisible watermarks are technically durable and privacy preserving, they might not help audiences interpret content. Though direct disclosures like visible watermarks have an intuitive appeal for providing greater transparency, such disclosures do not necessarily achieve their intended effects, and they can often be perceived as paternalistic, biased, and punitive, even when they are not saying anything about the truthfulness of a piece of content. Furthermore, audiences might misinterpret direct disclosures. A participant in my 2021 research misinterpreted Twitter’s “manipulated media” label as suggesting that the institution of “the media” was manipulating him, not that the content of the specific video had been edited to mislead. While research is emerging on how different user experience designs affect audience interpretation of content disclosures, much of it is concentrated within large technology companies and focused on distinct contexts, like elections. Studying the efficacy of direct disclosures and user experiences, and not merely relying on the visceral appeal of labeling AI-generated content, is vital to effective policymaking for improving transparency.

Could visibly watermarking AI-generated content diminish trust in “real” content?

Perhaps the thorniest societal question to evaluate is how coordinated, direct disclosures will affect broader attitudes toward information and potentially diminish trust in “real” content. If AI organizations and social media platforms are simply labeling the fact that content is AI-generated or modified—as an understandable, albeit limited, way to avoid making judgments about which claims are misleading or harmful—how does this affect the way we perceive what we see online?

Media literacy via disclosure is a noble endeavor; yet many working on policy teams within and beyond tech companies understandably worry that a premature push to label all generated content will usher in the liar’s dividend—a dynamic in which societal skepticism of all content as potentially AI-generated is so pronounced that it undermines trust in real content that is not generated with AI. This prospect also contributes to uncertainty on whether all seemingly low-stakes uses of AI in content creation—for example, the iPhone’s portrait mode, which relies on AI techniques, or voice assistants mentioned in the White House commitments—warrant a disclosure that AI was involved. The field needs to work together to measure societal attitudes toward information over time and determine when it makes sense to disclose the involvement of AI. Most important, they must evaluate the impact of visible disclosures that merely describe the method of content creation—stating that something was generated or edited by AI—as a proxy for what we really care about: indicating whether the content’s claim is true or false.

The challenges that watermarks and other disclosure techniques pose should not be used as an excuse for inaction or for limiting transparency. Instead, they should provide an impetus for companies, policymakers, and others to work together on definitions and decide how they’ll evaluate the inevitable trade-offs involved with implementation. Only then can generative AI policies adequately help audiences differentiate fact from fabrication.

Claire Leibowicz is the Head of the AI and Media Integrity Program at the Partnership on AI and a doctoral candidate at Oxford studying AI governance and synthetic media.