Blog

Andy Parsons Andy Parsons

Durable Content Credentials

Ensuring reliable provenance information is available no matter where a piece of content goes by combining C2PA metadata, watermarking, and fingerprinting technology.

by Andy Parsons, Sr. Director, Content Authenticity Initiative

Faced with a newly chaotic media landscape made up of generative AI and other heavily manipulated content, alongside authentic real photographs, video, and audio it is becoming increasingly difficult to know what to trust.  

Understanding the origins of digital media and if/how it was manipulated - as well as sharing that information with the consumer is now possible through Content Credentials, the global open technical specification developed by the C2PA, a consortium of over 100 companies working together within the Linux Foundation. 

Implementation of Content Credentials is on the rise with in-product support released and soon-to-be released by Adobe, Open AI, Meta, Google, Sony, Leica, Microsoft, Truepic, and many other companies.  

As this technology becomes increasingly commonplace, we’re seeing criticism circulating that relying solely on Content Credentials’ secure metadata, or solely on invisible watermarking to label generative AI content, may not be sufficient to prevent the spread of misinformation. 

To be clear, we agree. 

That is why, since its founding in 2021, the C2PA has been hard at work creating a robust and secure open standard in Content Credentials. While the standard focuses on a new kind of “signed” metadata, it also specifies measures to make the metadata durable, or able to persist in the face of screenshots and rebroadcast attacks. 

Content Credentials are sometimes confusingly described as a type of watermark, but watermarking has a specific meaning in this context and is only one piece in the three-pronged approach represented by Content Credentials. Let’s clarify all of this. 

The promise of Content Credentials is that they can combine secure metadata, undetectable watermarks, and content fingerprinting to offer the most comprehensive solution available for expressing content provenance for audio, video, and images.

  • Secure metadata: This is verifiable information about how content was made that is baked into the content itself, in a way that cannot be altered without leaving evidence of alteration. A Content Credential can tell us about the provenance of any media or composite. It can tell us whether a video, image, or sound file was created with AI or captured in the real world with a device like a camera or audio recorder. Because Content Credentials are designed to be chained together, they can indicate how content may have been altered, what content was combined to produce the final content, and even what device or software was involved in each stage of production. The various provenance bits can be combined in ways that preserve privacy and enable creators, fact checkers, and information consumers to decide what’s trustworthy, what’s not, and what may be satirical or purely creative.   

  • Watermarking: This term is often used in a generic way to refer to data that is permanently attached to content and hard or impossible to remove. For our purposes here, I specifically refer to watermarking as a kind of hidden information that is not detectable by humans. It embeds a small amount of information in content that can be decoded using a watermark detector. State-of-the-art watermarks can be impervious to alterations such as the cropping or rotating of images or the addition of noise to video and audio. Importantly, the strength of a watermark is that it can survive rebroadcasting efforts like screenshotting, pictures of pictures, or re-recording of media, which effectively remove secure metadata.

  • Fingerprinting: This is a way to create a unique code based on pixels, frames, or audio waveforms that can be computed and matched against other instances of the same content, even if there has been some degree of alteration. Think of the way your favorite music-matching service works, locating a specific song from an audio sample you provide. The fingerprint can be stored separately from the content as part of the Content Credential. When someone encounters the content, the fingerprint can be re-computed on the fly and matched against a database of Content Credentials and its associated stored fingerprints. The advantage of this technique is it does not require the embedding of any information in the media itself. It is immune to information removal because there is no information to remove.

So, we have three techniques that can be used to inform consumers about how media came to be. If each of these techniques were robust enough to ensure the availability of rich provenance no matter where the content goes, we would have a versatile set of measures, each of which could be applied where optimal and as appropriate.  

However, none of these techniques is durable enough in isolation to be effective on its own. Consider: 

  • Even if Content Credentials metadata cannot be tampered with without detection, metadata of any kind can be removed deliberately or accidentally. 

  • Watermarking is limited by the amount data that can be encoded without visibly or audibly degrading the content, and even then, watermarks can be removed or spoofed. 

  • Fingerprint retrieval is fuzzy. Matches cannot be made with perfect certainty, meaning that while useful as a perceptual check, they are not exact enough to ensure that a fingerprint matches stored provenance with full confidence. 

But combined into a single approach, the three form a unified solution that is robust and secure enough to ensure that reliable provenance information is available no matter where a piece of content goes. This single, harmonized approach is the essence of durable Content Credentials.  

Here is a deeper dive into how C2PA metadata, watermarks, and fingerprints are bound to the content to achieve permanent, immutable provenance. The thoughtful combination of these techniques leverages the strengths of each to mitigate the shortcomings of the others.  

A simple comparison of the components of durable Content Credentials, and their strength in combination.

Let’s look at how this works. First, the content is watermarked using a mode-specific technique purpose-built for audio, video, or images. Since a watermark can only contain an extremely limited amount of data, it is important to make the most of the bandwidth it affords. We therefore encode a short identifier and an indicator of where the C2PA manifest, or the signed metadata, can be retrieved. This could be a Content Credentials cloud host or a distributed ledger/blockchain. 

Next, we compute a fingerprint of the media, essentially another short numerical descriptor. The descriptor represents a perceptual key that can be used later to match the content to its Content Credentials, albeit in an inexact way as described earlier. 

Then, the identifier in the watermark and the fingerprint are added to the Content Credential, which already includes data pertaining to the origin of the content and the ingredients and tools that were used to make it. Now we digitally sign the entire package, so that it is uniquely connected to this content and tamper evident. And finally, the Content Credential is injected into the content and stored remotely. And just like that, in a few milliseconds, we have created a durable Content Credential. 

When a consumer of this media wishes to check the provenance, the process is reversed. If the provenance and content are intact, we need only verify the signed manifest and display the data. However, if the metadata has been removed, we make use of durability as follows: 

  1. Decode the watermark, retrieving the identifier it stores. 

  2. Use the identifier to look up the stored Content Credential on the appropriate Content Credentials cloud or distributed ledger. 

  3. Check that the manifest and the content match by using the fingerprint to verify that there is a perceptual match, and the watermark has not been spoofed or incorrectly decoded. 

  4. Verify the cryptographic integrity of the manifest and its provenance data. 

Again, within a few milliseconds we can fetch and verify information about how this content was made, even if the metadata was removed maliciously or accidentally. 

This approach to durability is not appropriate for every use case. For example, if a photojournalist wishes to focus primarily on privacy, they may not wish to store anything related to their photos and videos on any server or blockchain. Instead, they would ensure that the chain of custody between the camera and the publisher is carefully maintained so that provenance is kept connected and intact, but not stored remotely. 

However, in many cases, durable Content Credentials provide an essential balance between performance and permanence. And although technology providers are just beginning to implement the durability approach now, this idea is nothing new. The C2PA specification has always had the facility under its affordances for “soft bindings.”  

We recognize that although Content Credentials are an important part of the ultimate solution to help address the problem of deepfakes, they are not a silver bullet. For the Content Credentials solution to work, we need it everywhere — across devices and platforms — and we need to invest in education so people can be on the lookout for Content Credentials, feeling empowered to interpret the trust signals of provenance while maintaining a healthy skepticism toward what they see and hear online.  

Malicious parties will always find novel ways to exploit technology like generative AI for deceptive purposes. Content Credentials can be a crucial tool for good actors to prove the authenticity of their content, providing consumers with a verifiable means to differentiate fact from fiction.  

As the adoption of Content Credentials increases and availability grows quickly across news, social media, and creative outlets, durable Content Credentials will become as expected as secure connections in web browsers. Content without provenance will become the exception, provenance with privacy preservation will be a norm, and durability will ensure that everyone has the fundamental right to understand what content is and how it was made. 

Read More
Guest User Guest User

March 2024 | This Month in Generative AI: Text-to-Movie

An update on recent breakthroughs in a category of techniques that generate images, audio, and video from a simple text prompt.

Adobe Stock

by Hany Farid, UC Berkeley Professor, CAI Advisor

News and trends shaping our understanding of generative AI technology and its applications.

Generative AI embodies a class of techniques for creating audio, image, or video content that mimics the human content creation process. Starting in 2018 and continuing through today, techniques to generate highly realistic content have continued their impressive trajectory. In this post, I will discuss some recent breakthroughs in a category of techniques that generate images, audio, and video from a simple text prompt.

Faces

A common computational technique for synthesizing images involves the use of a generative adversarial network (GAN). StyleGAN is, for example, one of the earliest successful systems for generating realistic human faces. When tasked with generating a face, the generator starts by laying down a random array of pixels and feeding this first guess to the discriminator. If the discriminator, equipped with a large database of real faces, can distinguish the generated image from the real faces, the discriminator provides this feedback to the generator. The generator then updates its initial guess and feeds this update to the discriminator in a second round. This process continues with the generator and discriminator competing in an adversarial game until an equilibrium is reached when the generator produces an image that the discriminator cannot distinguish from real faces.

Below are representative examples of GAN-generated faces. In two earlier posts, I discussed how photorealistic these faces are and some techniques for distinguishing real from GAN-generated faces.

Eight GAN-generated faces. (Credit: Hany Farid)

Text-to-image

Although they produce highly realistic results, GANs do not afford much control over the appearance or surroundings of the synthesized face. By comparison, text-to-image (or diffusion-based) synthesis affords more rendering control. Models are trained on billions of images that are  accompanied by descriptive captions, and each training image is progressively corrupted until only visual noise remains. The model then learns to denoise each image by reversing this corruption. This model can then be conditioned to generate an image that is semantically consistent with a text prompt like “Pope Francis in a white Balenciaga coat.” 

From Adobe Firefly to OpenAI's DALL-E, Midjourney to Stable Diffusion, text-to-image generation is capable of generating highly photorealistic images with increasingly fewer obvious visual artifacts (like hands with too many or too few fingers).

Text-to-audio

In 2019, researchers were able to clone the voice of Joe Rogan from eight hours of voice recordings. Today, from only one minute of audio, anyone can clone any voice. What is most striking about this advance is that unlike the Rogan example, in which a model was trained to generate only Rogan's voice, today's zero-shot, multi-speaker text-to-speech can clone a voice not seen during training. Also striking is the easy access to these voice-cloning technologies through low-cost commercial or free open-source services. Once a voice is cloned, text-to-audio systems can convert any text input into a highly compelling audio clip that is difficult to distinguish from an authentic audio clip. Such fake clips are being used for everything from scams and fraud to election interference.

Text-to-video

A year ago, text-to-video systems tasked with creating short video clips from a text prompt like "Pope Francis walking in Times Square wearing a white Balanciaga coat" or "Will Smith eating spaghetti'' yielded videos of which nightmares are made. A typical video consists of 24 to 30 still images per second. Generating many realistic still images, however, is not enough to create a coherent video. These earlier systems struggled to create temporally coherent and physically plausible videos in which the inter-frame motion was convincing. 

However, just this month researchers from Google and OpenAI released a sneak peek into their latest efforts. While not perfect, the resulting videos are stunning in their realism and temporal consistency. One of the major breakthroughs in this work is the ability to generalize existing text-conditional image models to train on entire video sequences in which the characteristics of a full space-time video sequence can be learned.

In the same way that text-to-image models extend the range of what is possible as compared to GANs, these text-to-video models extend the ability to create realistic videos beyond existing lip-sync and face-swap models that are designed specifically to manipulate a video of a person talking.

Text-to-audio-to-video

Researchers from the Alibaba Group released an impressive new tool for generating a video of a person talking or singing. Unlike earlier lip-sync models, this technique requires only a single image as input, and the image is then fully animated to be consistent with any audio track. The results are remarkable, including a video of Mona Lisa reading a Shakespearean sonnet

When paired with text-to-audio, this technology can generate, from a single image, a video of a person saying (or singing) anything the creator wishes.

Looking ahead

I've come to learn not to make bold predictions about when and what will come next in the space of generative AI. I am, however, comfortable predicting that full-blown text-to-movie (combined audio and video) will soon be here, allowing for the generation of video clips from text such as: "A video of a couple walking down a busy New York City street with background traffic sounds as they sing Frank Sinatra's New York, New York." While there is much to be excited about on the content creation and creativity side, legitimate concerns persist and need to be addressed. 

While there are clear and compelling positive use cases of generative AI, we are already seeing troubling examples in the form of people creating non-consensual sexual imagery, scams and frauds, and disinformation

Some generative AI systems have been accused of infringing on the rights of creators whose content has been ingested into large training data sets. As we move forward, we need to find an equitable way to compensate creators and to give them the ability to opt in to or out of being part of training future generative AI models.

Relatedly, last summer saw a historic strike in Hollywood by writers and performers. A particularly contentious issue centered around the use (or not) of AI and how workers would be protected. The writers’ settlement requires that AI-generated material cannot be used to undermine a writer’s credit, and its use must be disclosed to writers. Protections for performers include that studios give fair compensation to performers for the use of digital replicas, and for the labor unions and studios to meet twice a year to assess developments and implications of generative AI. This latter agreement is particularly important given the pace of progress in this space.

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is an Adobe-led community of media and tech companies, NGOs, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

February 2024 | This Month in Generative AI: Election Season

From AI resurrected dictators to AI powered interactive chatbots, political campaigns around the world are deploying the technology to expand their audience and win over voters. This month, Hany Farid, UC Berkeley Professor, CAI Advisor, looks at examples of increasingly easier to combine fake audio with video, its clear effect on the electorate, and existing solutions to authenticating digital media.

Adobe Stock

by Hany Farid, UC Berkeley Professor, CAI Advisor

News and trends shaping our understanding of generative AI technology and its applications.

In May of 2019, a manipulated video of House Speaker Nancy Pelosi purportedly slurring her words in a public speech racked up over 2.5 million views on Facebook. Although the video was widely reported to be a deepfake, it was what we would today call a “cheap fake.” The original video of Speaker Pelosi was simply slowed down to make her sound inebriated — no AI needed. The cheap fake was, however, a harbinger.

Around 2 billion citizens will vote this year in some 70 elections around the globe. At the same time, generative AI has emerged as a powerful technology that can entertain, defraud, and deceive.

Today, nearly anyone can use generative AI to create hyper-realistic images from only a text prompt, clone a person's voice from a 30-second recording, or modify a video to make the speaker say things they never did or would say. Perhaps not surprisingly, generative AI is finding its way into everything from local to national and international politics. Some of these applications are used to bolster a candidate, but many are designed to be harmful to a candidate or party, and all applications raise new and complex questions.

Trying to help

In October of last year, New York City Mayor Eric Adams used generative AI to make robocalls in which he spoke Mandarin and Yiddish. (Adams only speaks English.) The calls did not disclose that the voice was AI-generated, and at least some New Yorkers believe that Adams is multilingual: "People stop me on the street all the time and say, ‘I didn’t know you speak Mandarin,’" Adams said. While the content of the calls was not deceptive, some claimed that the calls themselves were deceptive and an unethical use of AI.

Not to be outdone, earlier this year Representative Dean Phillips deployed a full-blown OpenAI-powered interactive chatbot to bolster his long-shot bid for the Democratic nomination in the upcoming presidential primary. The chatbot disclosed that it was an AI-bot and allowed voters to ask questions and hear an AI-generated response in an AI-generated version of Phillips's voice. Because this bot violated OpenAI's terms of service, it was eventually taken offline.

Trying to harm

In October of last year, Slovakia — a country that shares part of its eastern border with Ukraine — saw a last-minute and dramatic shift in its presidential election. Just 48 hours before election day, the pro-NATO and Western-aligned candidate Michal Šimečka was leading in the polls by some four points. A fake audio of Šimečka seeming to claim that he was going to rig the election spread quickly online, and two days later the pro-Moscow candidate Robert Fico won the presidential election by five points. It is impossible to say exactly how much the audio impacted the election outcome, but this incident raised concerns about the use of AI in campaigns.

Fast-forward to January of this month when the state of New Hampshire was holding the nation's first primary for the 2024 US presidential election. On the eve of the primary, more than 20,000 New Hampshire residents received robocalls impersonating President Biden. The call urged voters not to vote in the primary and to "save your vote for the November election." It took two weeks before New Hampshire’s Attorney General announced that his office identified two businesses behind these robocalls. 

The past few months have also seen an increasing number of viral images making the rounds on social media. These range from faked images of Trump with convicted child sex trafficker Jeffrey Epstein and a young girl, to faked images of Biden in military fatigues on the verge of authorizing military strikes. 

On the video front, it is becoming increasingly easier to combine fake audio with video to make people say and do things they never did. For example, a speech originally given by Vice President Harris on April 25, 2023, at Howard University was digitally altered to replace the voice track with a seemingly inebriated and rambling Harris.

And these are just a few examples of the politically motivated deepfakes that we have already started to see as the US national election heats up. In the coming months, I'll be keeping track of these examples as they continue to emerge.

Something in between

In the lead up to their election earlier in February, a once-feared army general, who ruled Indonesia with an iron fist for more than three decades, was AI resurrected with a message for voters. And, in India, former Dravida Munnetra Kazhagam – deceased since 2018 – was AI resurrected with an endorsement for his son, the sitting head of the state of Bengaluru. I expect this type of virtual endorsement will become an (ethically complex) trend.

Looking ahead

There are two primary approaches to authenticating digital media. Reactive techniques analyze various aspects of an image or video for traces of implausible or inconsistent properties. Learn more about these photo forensics techniques in my series for the CAI. Proactive techniques, on the other hand, operate at the source of content creation, embedding into or extracting from an image or video an identifying digital watermark or signature. 

Although not perfect, these combined reactive and proactive technologies will make it harder (but not impossible) to create a compelling fake and easier to verify the integrity of real content. The creation and detection of manipulated media, however, is inherently adversarial. Both sides will continually adapt, making distinguishing the real from the fake an ongoing challenge.

While it is relatively straightforward to regulate AI-powered non-consensual sexual imagery, child abuse imagery, and content designed to defraud, regulating political speech is more fraught. We, of course, want to give a wide berth for political discourse, but there should be limits on activities like those we saw in New Hampshire, where bad actors attempt to interfere with our voting rights. 

As a first step, following the New Hampshire AI-powered robocalls, the Federal Communications Commission quickly announced a ban on AI-powered robocalls. While the ruling is fairly narrow and doesn't address the wider issue of AI-powered election interference or non-AI-powered interference, it is a reasonable precaution as we all try to sort out this brave new world where anybody's voice or likeness can be manipulated.

As we continue to wrestle with these complex questions, we as consumers have to be particularly vigilant as we enter what is sure to be a highly contentious election season. We should be vigilant not to fall for disinformation just because it conforms to our personal views, we should be vigilant not to be part of the problem by spreading disinformation, and we should be vigilant to protect our and others' rights (even if we disagree with them) to participate in our democracy.

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is an Adobe-led community of media and tech companies, NGOs, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

January 2024 | This Month in Generative AI: Frauds and Scams

News and trends shaping our understanding of generative AI technology and its applications.

Adobe Stock

by Hany Farid, UC Berkeley Professor, CAI Advisor

News and trends shaping our understanding of generative AI technology and its applications.

Advances in generative AI continue to stun and amaze. It seems like every month we see rapid progression in the power and realism of AI-generated images, audio, and video. At the same time, it also seems like we are also seeing rapid advances in how the resulting content is being weaponized against individuals, societies, and democracies. In this post, I will discuss trends that have emerged in the new year.

First it was Instagram ads of Tom Hanks promoting dental plans. Then it was TV personality Gayle King hawking a sketchy weight-loss plan. Next, Elon Musk was shilling for the latest crypto scam, and, most recently, Taylor Swift was announcing a giveaway of Le Creuset cookware. All ads, of course, were fake. 

How it works

Each of these financial scams was powered by a so-called lip-sync deepfake, itself powered by two separate technologies. First, a celebrity's voice is cloned from authentic recordings. Where it used to take hours of audio to convincingly clone a person's voice, today it takes only 60 to 90 seconds of authentic recording. Once the voice is cloned, an audio file is generated from a simple text prompt in a process called text-to-speech. 

In a variant of this voice cloning, a scammer creates a fake audio file by modifying an existing audio file to sound like someone else. This process is called speech-to-speech. This latter fake is a bit more convincing because with a human voice driving the fake, intonation and cadence tend to be more realistic.

Once the voice has been created, an original video is modified to make the celebrity’s mouth region move consistently with the new audio. Tools for both the voice cloning and video generation are now readily available online for free or for a nominal cost.

Although the resulting fakes are not (yet) perfect, they are reasonably convincing, particularly when being viewed on a small mobile screen. The genius — if you can call it that — of these types of fakes is that they can fail 99% of the time and still be highly lucrative for scam artists. More than any other nefarious use of generative AI, it is these types of frauds and scams that seem to have gained the most traction over the past few months. 

Protecting consumers from AI-powered scams

These scams have not escaped the attention of the US government. In March of last year, the Federal Trade Commission (FTC) warned citizens about AI-enhanced scams. And more recently, the FTC announced a voice cloning challenge designed to encourage "the development of multidisciplinary approaches — from products to policies to procedures — aimed at protecting consumers from AI-enabled voice cloning harms, such as fraud and the broader misuse of biometric data and creative content. The goal of the challenge is to foster breakthrough ideas on preventing, monitoring, and evaluating malicious voice cloning."

The US Congress is paying attention, too. A bipartisan bill, the NO FAKES Act, would "prevent a person from producing or distributing an unauthorized AI-generated replica of an individual to perform in an audiovisual or sound recording without the consent of the individual being replicated." 

Acknowledging that there may be legitimate uses of AI-powered impersonations, the Act has carve-outs for protected speech: "Exclusions are provided for the representation of an individual in works that are protected by the First Amendment, such as sports broadcasts, documentaries, biographical works, or for purposes of comment, criticism, or parody, among others." While the NO FAKES Act focuses on consent, Adobe’s proposed Federal Anti-Impersonation Right (the FAIR Act) provides a new mechanism for artists to protect their livelihoods while also protecting the evolution of creative style.

Looking ahead

Voice scams will come in many forms, from celebrity-powered scams on social media to highly personalized scams on your phone. The conventional wisdom of "If it seems too good to be true, it probably is" will go a long way toward protecting you online. In addition, for now at least, the videos often have telltale signs of AI-generation because there are typically several places where the audio and video appear de-synchronized, like a badly dubbed movie. Recognizing these flaws just requires slowing down and being a little more thoughtful before clicking, sharing, and liking.

Efforts are underway to add digital provenance or verifiable Content Credentials to audio. Respeecher, a voice-cloning marketplace gaining traction among creators and Hollywood studios, is adding Content Credentials to files generated with its tool.

For the more personalized attacks that will reach you on your phone in the form of a loved one saying they are in trouble and in need of cash, you and your family should agree on an easy-to-remember secret code word that can easily distinguish an authentic call from a scam.

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is an Adobe-led community of media and tech companies, NGOs, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Coleen Jose Coleen Jose

How a voice cloning marketplace is using Content Credentials to fight misuse

It has become easier for people to create and deploy deepfakes at scale. Learn how Respeecher has embraced transparency in AI by integrating Content Credentials into their audio marketplace.

By Coleen Jose, Head of Community, CAI

Taylor Swift isn’t selling Le Creuset cookware. However, deepfake advertisements recently appeared that used video clips of the pop star and synthesized versions of her voice to look and sound as if she were doing a giveaway offer. Unsuspecting fans and others who trusted Swift’s brand clicked the ads and entered their personal and credit card information for a chance to win.

Public figures are prime targets for scams that are being created and scaled with help from voice cloning and video deepfake tools.

In July 2023, a video of Lord of the Rings star Elijah Wood appeared on social media. Originally created with Cameo, an app that lets people purchase personalized video messages from celebrities, the clip featured Wood sharing words of encouragement with someone named Vladimir who was struggling with substance abuse. “I hope you get the help that you need,” the actor said.

Soon after, it emerged that the heavily edited video was part of a Russian disinformation campaign that sought to insinuate that Ukrainian President Volodymyr Zelensky had a drug and alcohol problem. At least six other celebrities, including boxer Mike Tyson and Scrubs actor John McGinley, unwittingly became part of the scheme. Despite clumsy editing, the videos were shared widely on social platforms.

Now, imagine if the level of effort required to create these videos dropped to zero. Instead of commissioning content on Cameo, one could choose a celebrity or obtain enough video content of an individual and use AI to make them say or do anything.

This challenge is intensifying with the spread of AI-enabled audio and voice cloning technology that can produce convincing deepfakes. Governments have started taking steps to protect the public; for example, the US Federal Trade Commission issued a consumer warning about scams enabled by voice cloning and announced a prize for a solution to address the threat. Both the Biden administration and the European Union have called for clear labeling of AI-generated content. 

These scams and non-consensual imagery, which range from fake product endorsements to geopolitical campaigns, are increasingly targeting people who aren’t famous singers or politicians.

A commitment to verifiable AI-generated audio

Dmytro Bielievstov knows that top-down regulation isn’t enough. In 2017, along with Alex Serdiuk and Grant Reaber, the Ukraine-based software engineer co-founded Respeecher, a tool that allows anyone to speak in another person's voice using AI. “If someone uses our platform to pretend to be someone they are not, such as a fake news channel with synthetic characters, it might be challenging for the audience to catch that because of how realistic our synthetic voices are,” he said.

Keenly aware of the potential for misuse, the Respeecher team has embraced the use of Content Credentials through the Content Authenticity Initiative’s (CAI) open-source tools. Content Credentials are based on the C2PA open standard to verify the source and history of digital content. With an estimated 4 billion people around the world heading to the polls in over 50 elections this year, there’s an urgent need to ensure clarity about the origins and history of the visual stories and media we experience online. And there’s momentum for C2PA adoption across industries and disciplines.

We recently spoke with Dmytro about how Respeecher implemented Content Credentials, his tips for getting started, and why provenance is critical for building trust into the digital ecosystem.

This interview has been edited for length and clarity.

How would you describe Respeecher?

Respeecher lets one person perform in the voice of another. At the heart of our product is a collection of high-quality voice conversion algorithms. We don't use text as input; instead, we utilize speech. This enables individuals to not just speak but also perform with various vocal styles, including nonverbal and emotional expressions. For example, we enable actors to embody different vocal masks, similar to how they can put on makeup. 

Who are you looking to reach with the Voice Marketplace?

Initially, our focus was on the film and TV industry. We’ve applied our technology in big studio projects, such as Lucasfilm’s limited series “Obi-Wan Kenobi,” where we synthesized James Earl Jones’ voice for Darth Vader. 

Now, we’re launching our Voice Marketplace, which democratizes the technology we’ve used in Hollywood for everyone. The platform allows individuals to speak using any voice from our library. This will enable small creators to develop games, streams, and other content, including amateur productions. While we’ll maintain oversight of content moderation and voice ownership, we’ll now have less control over the content’s destination and usage. 

What motivated you to join the Content Authenticity Initiative?

First, the CAI aids in preventing misinformation. We do not allow user-defined voices (say President Biden’s), as controlling that would be quite difficult. Still, users should know that the content they’re consuming involves synthetic speech. As a leader in this space, we want to ensure that all generated content from our marketplace has Content Credentials. In the future, as all browsers and content distribution platforms support data provenance, it will be easy for consumers to verify how the audio in a video was produced. If something lacks cryptographic Content Credentials, it will automatically raise suspicion about authenticity. 

Secondly, the initiative addresses the needs of content creators. By embedding credentials in the audio they produce, they can make it clear that the work is their own artistic creation. 

How do Content Credentials work in Respeecher?

We have integrated CAI's open-source C2PA tool into our marketplace. Whenever synthetic audio is rendered on our servers, it is automatically cryptographically signed as being a product of the Respeecher marketplace. When clients download the audio, it contains metadata with Content Credentials stating that it was converted into a different voice by Respeecher. GlobalSign, our third-party certificate authority, signs our credentials with its cryptographic key so that anyone who receives our content can automatically verify that it was signed by us and not an impersonator.

The metadata isn't a watermark but rather a side component of the file. If the metadata is removed, consumers are alerted that the file's source is uncertain. If the file is modified after it’s downloaded (say someone changes the name of the target voice), the cryptographic signature won't match the file's contents. So it’s impossible to manipulate the metadata without invalidating Respeecher’s signature. 

All voice cloned audio files downloaded from Respeecher Marketplace includes Content Credentials for AI transparency.

What challenges did you encounter while implementing Content Credentials technology?

Surprisingly, the biggest challenge was obtaining keys from our certificate authority. We had to prove that we were a legitimate organization, which took several weeks and some back and forth with GlobalSign.

Another challenge was that cryptography, particularly public key infrastructure (PKI), can be challenging to grasp for someone who isn’t an expert. Our team had to understand the specifics of C2PA, figure out appropriate configurations, and determine whether we needed a third-party certificate authority. These nuances required time and effort, especially since we don’t have a cryptographic expert on our team. However, the CAI team and community were incredibly helpful in assisting us with these challenges.

What advice do you have for other developers getting started with Content Credentials and the CAI’s open-source tools?

Start with CAI’s Getting Started guide, but then dedicate time to read the C2PA specification document. Although it is somewhat long and intimidating, it’s surprisingly comprehensible for non-experts. 

Also, utilize ChatGPT to help explain complex concepts to you. Even though ChatGPT doesn’t know the technical details of C2PA (because its current version has limited access to information beyond certain dates), it still does a great job explaining concepts such as PKI and cryptographic signatures.

What's next for you when it comes to advances in AI safety and Content Credentials?

We’re planning to give users of our marketplace the option to add their own authorship credentials to the Content Credentials. Currently, the metadata indicates that the audio was modified by Respeecher, but it doesn't attribute the audio to its creator. Some users may prefer to remain anonymous, but others will choose to include their credentials. 

Respeecher will continue to be at the forefront of initiatives to adopt data provenance standards across the AI synthetic media industry. It’s essential for companies to create a unified authentication layer for audio and video content, and content distribution platforms like YouTube and news websites have a crucial role to play in embracing this technology. Just as a lack of HTTPS warns users of potential security issues, a similar mechanism could alert users to the source of an audio file, enhancing transparency and authenticity.

We are also closely watching how the concept of cloud-hosted metadata develops. Embedding hard-to-remove watermarks in audio and video signals without making them obvious remains a largely unsolved problem. By storing metadata in the cloud and enabling checks for pre-signed content, we can potentially simplify authentication and the synthetic media detection problem.

The Respeecher team pictured in their office in Kiev, Ukraine has won awards for their projects with global brands including an Emmy and Webby. Photo courtesy: Respeecher

Read More
Coleen Jose Coleen Jose

CAI Symposium 2023: momentum for a trustworthy and transparent digital ecosystem

Read our takeaways and see highlights from this year's CAI Symposium co-hosted with Stanford Data Science.

Dana Rao, EVP, General Counsel and Chief Trust Officer at Adobe opened this year’s CAI Symposium co-hosted with Stanford Data Science.

By Coleen Jose, Head of Community, CAI

When a group of 59 organizations gathered in January 2020 for the first Content Authenticity Initiative (CAI) Summit, generative AI tools were not available for mass consumption. The main themes then were media literacy, context and intent, and the need to find a solution that drives trust and transparency online while mitigating the potential for bad actors.   

Those discussions laid the foundation for the C2PA standard, verifiable Content Credentials, and the CAI’s open-source tools. Attendees envisioned these pillars as a way of ensuring clarity about the origins and history of the visual stories and media we consume online. 

Back in 2020, the proliferation of mis- and disinformation across social media was also top of mind. Newsrooms and readers were struggling with challenging questions: How can we verify the origins of photographs and videos? How do we confirm facts emerging from a breaking news event? How do detection tools and digital forensics keep up with technological advances and the torrent of content before it goes viral? 

Today, the speed of innovation in generative AI has again brought trust and transparency to the forefront. The potential of AI to both do good and cause harm is a dominant question in industry, policy, and media circles. 

At this year’s CAI Symposium, held on Dec. 7, similar themes emerged. This time, over 200 people attended—from industry leaders to experts in cryptography, identity, media, and policy. What they had in common was a commitment to the pursuit of an open solution to build trust and transparency into the digital ecosystem. Participants reflected the growing momentum of C2PA adoption across devices, software, services, and platforms.   

Co-hosted with Stanford Data Science, the event was a full day of presentations, breakout sessions, and hands-on demonstrations from the CAI community. 

Here are three takeaways from the CAI Symposium: 

1. Scaling the chain of trust. Speakers from Adobe, the Partnership on AI, and WITNESS alluded to the importance of scaling trust and transparency solutions from the ground up, starting with frontier models, research, and development stages to the end-user experience. We saw hardware innovations in the form of Leica’s M11-P camera, the world’s first Content Credentials-enabled camera, which the company plans to extend to other products. With the introduction of generative AI tools on mobile devices, Qualcomm shared how building the C2PA standard into its chip processers can add transparency and mitigate potential harms. 

 2. Elections and media integrity. Forty countries representing 3.2 billion people will go to the polls in 2024, and we’re already seeing AI-generated images and videos influencing the democratic process. Additionally, a steady stream of misleading imagery and deepfakes are emerging from the wars in Ukraine and Gaza. The stakes have never been higher for media authenticity, and our work in this space is critically important.  We were thrilled to highlight that BioBioChile became the world’s first media organization to implement Content Credentials on a live news website.  

 3. User experience and consumer education. Significant work lies ahead to establish the Content Credentials icon as the starting point for a new language of authenticity and context. Looking ahead to 2024 and beyond, one area with immense potential for innovation is ensuring an accessible, simple, and clear experience across mobile and web platforms. As technology continues to advance, it will be crucial to create a seamless and user-friendly interface for verifying content authenticity. This commitment to accessibility and clarity will play a pivotal role in enhancing user experience and promoting trust in the digital landscape. 

The expertise and industry leadership that the CAI community brings spans engineering, product, trust infrastructure, policy, media, creativity, and education. We’re inspired by the ever-increasing adoption of Content Credentials and excited to take on the challenges ahead, together. 

What began as a sea of promising ideas at the 2020 CAI Summit has evolved into a deep focus on deploying Content Credentials. The way forward depends on enabling honest discourse, surfacing challenges, and building solutions. If one thing was evident at this year’s Symposium, it's this community's commitment to that path. 

Thank you to our speakers, moderators, and demo presenters!

  • Opening Remarks by Dana Rao, Adobe

  • Leica Content Credentials by Nico Kohler and Jesko von Oeynhausen, Leica

  • C2PA Primer by Leonard Rosenthol, Adobe

  • Trust and attestation in a modern mobile SoC by Asaf Shen, Qualcomm

  • Navigating Trust: Provenance Technology in Journalism by Santiago Lyon, Adobe and David Clinch, Media Growth Partners

  • Perspectives from the MediFor and SemaFor Programs by Matt Turek, DARPA

  • Content Authenticity with Zero Knowledge by Dan Boneh, Stanford University

  • Safe and Sound? Public Policy and AI by Rebecca Finlay, Partnership on AI  

  • Closing Remarks by Hany Farid, UC Berkeley

Watch the highlights and speaker recordings. Join the movement to restore trust and transparency online.

Photography by Adam Perez

Breakout Sessions and Moderators

  • Identity and Authenticity with Eric Scouten, Adobe and Nathan Freitas, The Guardian Project

  • User Experience for Media Authenticity with Pia Blumenthal and Andrew Kaback, Adobe

  • Open source C2PA with Maurice Fisher, Gavin Peacock, Adobe

  • Provenance in Digital Media, with Santiago Lyon, Adobe

  • Addressing AI Image Abuse Beyond Provenance with Dr. Rebecca Portnoff, Thorn and Henry Ajder, Latent Space

  • The C2PA Technical Specification, Present and Future with Leonard Rosenthol, Adobe/C2PA Technical Chair and Andrew Jenks, Microsoft/C2PA Chair

Demo Presenters

  • DataTrails: Establishing a new model of digital trust by assuring provenance and integrity of data crossing business and application boundaries.

  • Leica Camera AG: The M11-P, the world's first camera to guarantee the source of images through the Content Credentials standard.

  • ProofMode: Capture, share, and preserve verifiable photos and videos for Android and iOS.

  • Nikon: Content Credentials technology in exhibiting cameras, the industry-leading mirrorless Z 9, bringing provenance and authenticity to digital images at the point of capture.

  • Reuters: Preserving trust in photojournalism through authentication technology.

  • SmartFrame: Building trusted collections with New Zealand Rugby and Manchester City F.C., using Content Credentials and image-streaming technology to serve and protect images.

  • Truepic, Pixelstream, BioBioChile: The first end-to-end Content Credentials workflow for news, implemented by BioBioChile with Truepic’s Mobile Capture and Pixelstream’s CDN and image-editor platform.

  • Yuify by Wacom: Prove authorship with secure records and receive a hassle-free licensing tool.

Subscribe to the CAI newsletter to receive community stories and ecosystem news.

Stay connected and consider joining the movement to restore trust and transparency online.

Read More
Coleen Jose Coleen Jose

Community showcase: building trust and transparency into our digital ecosystem

Trust in digital media is consequential — from the practical verification of the source of a news photograph to the less visible web infrastructure that impacts society and economies of scale. In this Community Showcase, we focus on solutions that build transparency into our digital ecosystem.

Trust in digital media is consequential — from the practical verification of the source of a news photograph to the less visible web infrastructure that impacts society and economies of scale. In this Community Showcase, we focus on solutions that build transparency into our digital ecosystem.

The C2PA global standard and its clear, verifiable Content Credentials are powering a movement for transparency in digital media through the Content Authenticity Initiative’s (CAI) open-source tools and cross-industry community. In this Showcase, hear from CAI members DataTrails, BNB Chain, and Leica Camera AG —  three companies that are contributing to the ecosystem and sharing what they’ve learned about the flexible Content Credentials technology that exists today to verify, maintain, and restore context to content. 

What we cover

  • Maintaining the integrity of data across business and application boundaries with DataTrails

  • Integrating the C2PA provenance standard into decentralized storage with BNB Chain

  • Creative attribution and interoperability with the world’s first Content Credentials-enabled camera, from Leica

  • Q&A 

Speakers

  • Coleen Jose, Head of Community, Content Authenticity Initiative, Adobe

  • Jon Geater, Chief Product & Technology Officer, DataTrails

  • Tomasz Wojewoda, Head of Business Development, BNB Chain

  • Nico Koehler, Head of Product Experience, Leica Camera AG

Read More
Santiago Lyon Santiago Lyon

Leica launches world’s first camera with Content Credentials

Industry-leading camera manufacturer Leica launches the new M11-P camera — the world’s first camera with Content Credentials built-in. This is a significant milestone for the CAI and the future of photojournalism.

An image of the NYC skyline

This image of the New York City skyline was created with the Leica M11-P and now includes Content Credentials at the point of capture to protect the authenticity of images. At the top right of the image, preview its Content Credentials digital nutrition label including information such as name, dates, changes made and tools used.

By Santiago Lyon, Head of Advocacy and Education, CAI

We are thrilled to announce that industry-leading camera manufacturer Leica is officially launching the new M11-P camera — the world’s first camera with Content Credentials built-in.

This is a significant milestone for the Content Authenticity Initiative (CAI) and the future of photojournalism: It will usher in a powerful new way for photojournalists and creatives to combat misinformation and bring authenticity to their work and consumers, while pioneering widespread adoption of Content Credentials. 

With manipulated content and misinformation more on the rise than ever, trust in the digital ecosystem has never been more critical. We are entering a new era of creativity, where generative AI is expanding access to powerful new workflows and unleashing our most imaginative ideas. The Leica M11-P launch will advance the CAI’s goal of empowering photographers everywhere to attach Content Credentials to their images at the point of capture, creating a chain of authenticity from camera to cloud and enabling photographers to maintain a degree of control over their art, story and context.

A photograph created with the Leica M11-P and its Content Credential.

Leica has implemented the global Coalition for Content Provenance and Authenticity (C2PA) standard in the M11-P camera so that each image is captured with secure metadata. This means it carries information such as camera make and model, as well as content-specific information including who captured an image and when, and how they did so. Each image captured will receive a digital signature, and the authenticity of images can be easily verified by visiting contentcredentials.org/verify or in the Leica FOTOS app.

This is a watershed moment for trust and transparency for photographers and creatives – its significance cannot be overstated. This is the realization of a vision the CAI and our members first set out four years ago, transforming principles of trust and provenance into consumer-ready technology.

With the integration of the CAI framework, Leica will help combat the pervasive issues mis/disinformation and preserve trust in digital content and sources. Further, with this integration, recent announcements at MAX, and the broad availability of our free open-source tools powering them, Content Credentials are seeing accelerated adoption around the world, including among photojournalists, news outlets, creative professionals, everyday consumers, social media influencers, artists and innovators.

Adobe co-founded the Content Authenticity Initiative (CAI) in 2019 to help combat the threat of misinformation and help creators get credit for their work. Today the CAI is a coalition of nearly 2,000 members, including Leica Camera, AFP, the Associated Press, the BBC, Getty Images, Microsoft, Reuters, The Wall Street Journal and more, all working together to add a verifiable layer of transparency and trust to content online – via secure metadata called Content Credentials.

Between the tremendous momentum in attracting new members and the growing adoption of Content Credentials by leaders spanning multiple industries, the CAI is ensuring that technological innovations are built on ethical foundations.

Snapshot: How Content Credentials Works

  • Transparency at the point of capture: We believe the chain of authenticity is strongest at the moment a piece of media is created — being able to verify the circumstances of an image’s origin is the foundation for knowing whether to trust it.

  • Get credit for your photography work: Content Credentials enable photojournalists and creatives to assert credit for their work, ensuring that wherever one of their images goes, their identity travels indelibly with it.

  • Bring trust to your digital content with a digital nutrition label: Content Credentials are the digital nutrition label and most widely adopted industry standard for content of all kinds, and the foundation for increased trust and transparency online. 

Leica’s M11-P camera will be available globally at all Leica Stores, the Leica Online Store and authorized dealers, starting today. To learn more, please visit: https://leica-camera.com/m11-p

Content Credentials in Adobe Photoshop is enabled and the image from a Leica M11-P is imported. Here, you can preview the Content Credentials identifying an ingredient from the Leica camera signifying a Content Credential exists.

The photograph is significantly altered using the Sky Replacement tool in Adobe Photoshop. This edit becomes part of the file’s Content Credentials.

The edited image is exported from Adobe Photoshop and inspected using Verify (contentcredentials.org/verify), a CAI website that reads and surfaces Content Credentials where consumers can inspect changes made to an asset.

Read More
Guest User Guest User

Passive versus active photo forensics in the age of AI and social media

"There is a long history of marking content to prove authenticity, indicate ownership, and protect against counterfeiting," Hany Farid writes in his final piece in a series exploring passive and active techniques to distinguish real and synthetic images. He dives into watermarking and provenance technology as scalable and interoperable technology that restore transparency online and empower us to verify then trust.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the final article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections


by Hany Farid

In the past few posts, I have described representative examples of passive forensic techniques. This category of tools for distinguishing the real from the fake assumes no explicit knowledge of the content source or specialized recording equipment. 

The benefit of these techniques is that they are applicable to a broad category of content. The drawback, however, is they sometimes require manual intervention. The shadow and reflection analysis, for example, requires identifying corresponding points on an object and shadow or reflection. While this is a relatively simple step for a human analyst, it is still out of reach of automation. As a result, these techniques cannot be applied to the torrent of daily online uploads.

Active forensic techniques, on the other hand, operate at the source of creation, embedding into or extracting from an identifying digital watermark or signature. There is a long history of marking content to prove authenticity, indicate ownership, and protect against counterfeiting. For example, Getty Images, a massive image archive, adds a visible watermark to all digital images in their catalog. This allows customers to freely browse images while protecting Getty’s assets. 

An “invisible” or steganographic watermark can be added to a digital image by, for example, tweaking every 10th image pixel so that its color (typically a number in the range of 0 to 255) is even-valued. Because this adjustment is so minor, the watermark is imperceptible. And, because this periodic pattern is unlikely to occur naturally, it can be used to verify an image’s provenance.

The ideal watermark is one that is imperceptible and also resilient to basic manipulations like cropping or resizing. Although the above pixel manipulation is not resilient to, for example, image resizing, many robust watermarking strategies have been proposed that are resilient — though not impervious — to attempts to remove them.

The benefit of watermarks is that identifying information is directly associated with a piece of content before it is released into the world, making identification fast and easy. The drawback to this approach is that watermarks are vulnerable to attack, where an adversary can digitally remove the watermark while leaving the underlying content largely intact.

Therefore, in addition to embedding watermarks, a creator can extract an identifying fingerprint from the content and store it in a secure centralized ledger. (This fingerprint is sometimes called a perceptual hash — see article 2 in Further Reading below.) The provenance of a piece of content can then be determined by comparing the fingerprint of an image or video that is out in the world to the fingerprint stored in the ledger. Both watermarks and fingerprints can be made cryptographically secure, making it very difficult to forge.

This type of watermarking and fingerprinting is equally effective for actively tracking AI-generated and human-recorded content. The benefit of this approach is that it can efficiently scan the billions of online daily uploads. The drawback is that it requires specialized hardware or software to be used at the point of creation and recording. It also requires downstream platforms like YouTube, Twitter, and TikTok to respect the watermarks, and it requires browsers to display the accompanying content labels. Although not perfect, these combined technologies make it significantly harder to create a compelling fake and significantly easier to verify the integrity of real content.

Importantly, this active approach is agnostic as to the truthfulness or value of content. The goal of this approach is simply, but critically, to distinguish between human-recorded and AI-generated content.

A mockup of a social media feed that integrates Content Credentials viewing and interaction. Content Credentials activate the C2PA standard for content authenticity and provenance.

I started this series by talking about my initial journey some 25 years ago into the world of media authentication and forensics, and so I'll end with some thoughts of what may lie ahead. 

A combination of many factors has contributed to a complex and at times chaotic online information ecosystem. With democratized access to powerful editing and creation software, the widespread deployment of generative AI is poised to significantly complicate this environment. Harms ranging from the creation of non-consensual sexual imagery to small- and large-scale fraud and disinformation have already emerged even in these early days of this AI revolution. 

Legitimate concerns have also been raised regarding creators' rights and their very livelihoods. Here we can draw on some historical analogs. The introduction of photography did not — as some 19th-century artists feared — destroy art or artists. It disrupted some aspects like portrait paintings, but it also liberated artists from realism, eventually giving rise to Impressionism and the Modern Art movements. 

Similarly, the digitization of music did not kill music or musicians, but it did fundamentally change the way we produce and listen to music, and it spawned new genres of music. This history suggests that generative AI does not necessarily have to lead to the demise of creators, but that it can, with proper care and thought, be a new enabling medium. 

There is much to be excited about during these early days of the AI revolution. At the same time, there are legitimate and real concerns as to how these new technologies are being weaponized against individuals, societies, and democracies, and how they might disrupt large sections of the workforce. Advances in AI do not need to lead us into an unrecognizable dystopian future. But, technologists, corporate leaders, and regulators need to carefully consider how they develop and deploy technologies, and build in proper safeguards from the point of conception through rollout.

Further reading:

[1] H. Farid. Watermarking ChatGPT, DALL-E and Other Generative AIs Could Help Protect Against Fraud and Misinformation, The Conversation, March 27, 2023.

[2] H. Farid. An Overview of Perceptual Hashing. Journal of Online Trust and Safety, 1(1), 2021.

[3] Z. Epstein, et al. Art and the Science of Generative AI. Science, 380(6650):1110-1111, 2023.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

Photo forensics from lighting shadows and reflections

Where there is light, there are shadows. The relationship between an object, its shadow, and the illuminating light source(s) is geometrically simple, and yet it’s deceptively difficult to get just right in a manipulated or synthesized image.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the fifth article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments


by Hany Farid

Where there is light, there are shadows. The relationship between an object, its shadow, and the illuminating light source(s) is geometrically simple, and yet it’s deceptively difficult to get just right in a manipulated or synthesized image. In the image below, the bottle's cast shadow is clearly incongruous with the shape of the bottle. Such obvious errors in a shadow are easy to spot, but more subtle differences can be harder to detect.

CGI model credit to Jeremy Birn, Lighting and Rendering in Maya.

Below are two images in which the bottle and its cast shadow are slightly different. (The rest of the scene is identical). Can you tell which is consistent with the lighting in the rest of the scene?

The geometry of cast shadows is dictated by the 3D shape and location of an object and the illuminating light(s). This relationship is wonderfully simple: A point on an object, its corresponding shadow, and the light source responsible for the shadow all lie on a single line.

Because (in the absence of lens distortion) straight lines in the 3D scene are imaged to straight lines in the 2D image, this basic constraint holds in the image. Locate any point on a shadow and its corresponding point on the object, and draw a line through them. Repeat for as many clearly defined shadow and object points as possible, and for an authentic image, all the lines will intersect at one point — the location of the illuminating light.

Below are the results of this simple geometric analysis applied to the two images above, clearly revealing the fake.

From shadows to reflections

As a student taking the subway to work each morning, I would stare out the window and watch the tunnel walls whiz by superimposed atop reflections of my fellow passengers. On occasion someone’s reflection would catch my eye, and I would look back into the subway car to get a better view. But I would invariably not see them where I thought they would be given the position of their reflection on the window. As a budding image and vision scientist this really bothered me — how could it be that it was so hard to reason about something so simple as a reflection in a window?

Below are two images of the same basic scene. The reflection of the table and garbage can are slightly different; everything else is the same. Can you tell which is correct?

The geometry of reflections is fairly straightforward. Consider standing in front of a mirror and looking at your reflection. An imaginary straight line connects each point on your body with its reflection. These lines are perpendicular to the mirror’s surface and are parallel to one another. When photographed at an oblique angle, however, these imaginary lines will not remain parallel but will converge to a single point. This is the same reason why railroad tracks that are parallel in the world appear to converge in a photograph.

This geometry of reflections suggests a simple forensic technique for verifying the integrity of reflections. Locate any point on an object and its corresponding point on the reflection, and draw a line through them. Repeat for as many clearly defined object and reflection points as possible. As you do this, you will find that all the lines should intersect at one point. 

Below are the results of this simple geometric analysis, which clearly reveals the second reflection to be the fake.

Notice that this geometric analysis is exactly the same as that used to analyze shadows. The reason is that in both cases we are exploiting basic principles of perspective projection that dictate the projection of straight lines.

In practice, there are some limitations to a manual application of this geometric analysis. We must take care to select appropriately matched points on the shadow/reflection and the object. We can best achieve this when the object has a distinct shape, like the corner of a cube or the tip of a cone. Depending on the scene geometry, the constraint lines may be nearly parallel, making the computation of their intersection vulnerable to slight error in selecting matched points. And, it may be necessary to remove any lens distortion in the image that causes straight lines to be imaged as curved lines that will then no longer intersect at a single point.

From Adobe Photoshop to generative AI

When we first developed these techniques, we were primarily concerned with images that were manipulated with Photoshopto, for example, digitally add a person or object and its shadow or reflection. Because our own visual system is not particularly sensitive to inconsistent shadows and reflections (see item 1 in the Further Reading section below) and because it is difficult to precisely match the 3D scene geometry within 2D photo editing software, this technique has proven quite effective at uncovering fakes.

Classic computer-generated imagery is produced by modeling 3D scene geometry, the surrounding illumination, and a virtual camera. As a result, rendered images accurately capture the geometry and physics of natural scenes, including shadows and reflections. In contrast, AI-generated imagery is produced by learning the statistical distribution of natural scenes from a large set of real images. Without an explicit 3D model of the world, it is natural to wonder how accurately synthesized content captures the geometry of shadows and reflections. 

Although we are still early in the age of generative AI, today's AI-generated images seem to struggle to produce perspectively correct shadows and reflections. Below is a typical example, generated using OpenAI's DALL-E2, in which the shadows are inconsistent (yellow lines), the reflections are impossibly mismatched and missing, and the shadows in the reflection are oriented in exactly the wrong direction. (See item 2 in the Further Reading section below for a more detailed analysis.

Even in the absence of explicit 3D modeling of objects, surfaces, and lighting — as found in traditional CGI-rendering — AI-generated images exhibit many of the properties of natural scenes. At the same time, cast shadows and reflections in mirrored surfaces are not fully consistent with the expected perspective geometry of natural scenes. 

The trend in generative AI has been that increasingly larger synthesis engines yield increasingly more realistic images. As such, it may just be a matter of time before generative AI will learn to create images with full-blown perspective consistency. Until that time, however, this geometric forensic analysis may prove useful.

Further reading:

1. S.J. Nightingale, K.A. Wade, H. Farid, and D.G. Watson. Can People Detect Errors in Shadows and Reflections? Attention, Perception, & Psychophysics, 81(8):2917-2943, 2019.

2. H. Farid. Perspective (In)consistency of Paint by Text. arXiv:2206.14617, 2022. 

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

Photo forensics from lighting environments

In his latest piece, digital forensics expert and CAI advisor Hany Farid investigates how lighting environments can help distinguish real images from AI-generated ones. He dives into techniques used in computer-generated imagery (CGI) technology we've grown accustomed to through entertainment and gaming and today's AI systems producing photorealistic images.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the fourth article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces


by Hany Farid

Classic computer-generated imagery (CGI) is produced by modeling 3D scene geometry, the surrounding illumination, and a virtual camera. As a result, rendered images accurately capture the geometry and physics of natural scenes. By contrast, AI-generated imagery is produced by learning the statistical distribution of natural scenes. Without an explicit 3D model of the world, we might expect that these images will not always accurately capture the 3D properties of natural scenes. 

The lighting of a scene can be complex, because any number of lights can be placed in any number of positions, leading to different lighting environments. Under some fairly modest assumptions, however, it has been shown that a wide range of lighting can be modeled as a weighted sum of nine 3D lighting environments that each capture a variety of lighting patterns from different directions.

My team and I first wondered if lighting environments on AI-generated faces match those on photographic images. Next, we wondered if the lighting on multiple faces or objects in an image are the same, as would be expected in an authentic photo.

Analyzing lighting environments first requires fitting and aligning a 3D model to individual objects. Fortunately, for faces, fitting a 3D model to a single image is a problem that has largely been solved. (See article 1 in the Further Reading section.) 

Below you’ll see a 3D model fitted to an AI-generated face. From this aligned 3D model, we can estimate the surrounding lighting environment, which reduces to nine numeric values corresponding to the weight on each of nine 3D lighting environments.

An AI-generated face and a fitted 3D model. (Credit: Hany Farid)

As you can see in the plot on the left below, there are some significant deviations between the nine lighting values for real faces (horizontal axis) and diffusion-generated faces (vertical axis). Interestingly, as you can see in the plot on the right, the lighting on GAN-generated faces (see previous post) are nearly indistinguishable from the lighting on real faces.

The relationship between lighting environments on real (horizontal axis) and AI-generated (vertical axis) faces. The plot on the left corresponds to diffusion-based faces, the plot on the right corresponds to GAN-generated faces. (Credit: Hany Farid)

While a comparison of lighting environments may expose a diffusion-based face (or object), it appears that this approach will not provide much insight for GAN-based faces. This is the reverse of our previously described technique, which was applicable to GAN-based faces but not diffusion-based faces, further highlighting the need for a wide variety of approaches.

The above analysis can be expanded to compare the lighting between two people or two objects situated side by side in an image where, in a naturally photographed scene, we expect the lighting to be the same. For this analysis, we find that diffusion-based faces and objects exhibit relatively large (but not always perceptually obvious) differences in lighting. (See article 3 in the “Further reading” section below for more details.)

The strength of this lighting analysis is that we have discovered a fairly fundamental limitation of today's AI-generated content: In the absence of explicit 3D models of the physical world, diffusion-based models struggle to create images that are internally (across multiple people/objects) and externally (as compared to photographic images) consistent. The weakness of this technique is that, unlike other approaches, it does not necessarily lend itself to a fully automatic and high-throughput analysis. The fitting of 3D models can be somewhat time intensive and require manual intervention.

Another strength of this technique is that even when made aware of the issue of lighting inconsistencies, it is not easy for an adversary to adjust the image. That is because the pattern of illumination is the result of an interaction of the 3D lighting and the 3D scene geometry, whereas most photo editing occurs in 2D. Also, as you will see in the next post, there are additional aspects to lighting that we can exploit to distinguish the real from the fake.

Further reading:

[1] Y. Feng, H. Feng, M.J. Black, and T. Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics, Proc. SIGGRAPH, 40(4):88:1– 840 88:13, 2021.

[2] M. Boháček and H. Farid. A Geometric and Photometric Exploration of GAN and Diffusion Synthesized Faces. Workshop on Media Forensics at CVPR, 2023. 

[3] H. Farid. Lighting (In)consistency of Paint by Text. arXiv:2207.13744, 2022.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

Photo forensics for AI-generated faces

Is your new social media connection a real person or AI-generated? In the latest installment of Hany Farid's series, he describes a technique for detecting GAN-generated faces, such as those typically found in online profiles.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the third article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?


by Hany Farid

As I discussed in my previous post, generative adversarial networks (GANs) can produce remarkably realistic images of people that are nearly indistinguishable from images of real people. Here I will describe a technique for detecting GAN-generated faces, such as those typically found in online profiles

Below you will see the average of 400 StyleGAN2 faces (left) and 400 real profile photos from people with whom I am connected on LinkedIn (right). 

Because the real photos are so varied, the average profile photo is fairly nondescript. By contrast, the average GAN face is highly distinct with almost perfectly focused eyes. This is because the real faces used by the GAN discriminator are all aligned in the same way, resulting in all the AI-generated faces having the same alignment. It seems that this alignment was intentional as a way to improve generation of more consistently realistic faces. 

In addition to having the same facial alignment, the StyleGAN faces also appear from the neck up. By contrast, real profile photos tend to show more of the upper body and shoulders. It is these within-class similarities and across-class differences that we seek to exploit in our technique for detecting GAN-generated faces.

The average of 400 GAN-generated faces (left) and 400 real profile photos (right). (Credit: Hany Farid)

Starting with 10,000 faces generated by each of the three versions of StyleGAN (1, 2, 3), we learn a low-dimensional representation of GAN-generated faces. This learning takes two forms: the classic, linear, principal components analysis (PCA) and a more modern auto-encoder. 

Regardless of the underlying computational technique, this approach learns a constructive model that captures the particular properties of StyleGAN-generated faces. In the case of PCA, this model is a simple weighted sum of a small number of images learned from the training images.

Once the model is constructed, we can ask if any image can be accurately reconstructed with this model. The intuition here is that if the model captures properties that are specific to GAN faces, then GAN faces will be accurately reconstructed with the model but real photos (that do not share these properties) will not.

Below you will see the average reconstruction error for 400 GAN faces (left) and 400 real faces (right). They are displayed on the same intensity scale where, the brighter the pixel value, the larger the reconstruction error. Here you will see that the real faces have significantly higher reconstruction errors throughout, particularly around the eyes.

The average reconstruction error for 400 GAN-generated faces (left) and 400 real profile photos (right). (Credit: Hany Farid)

By simply computing this reconstruction error, we can build a fairly reliable classifier to determine whether a face was likely to have been generated by StyleGAN. We can make this classifier even more reliable by analyzing the eiGAN-face model weights needed for the full reconstruction. 

Using this approach, we can detect more than 99% of GAN-generated faces, while only misclassifying 1% of photographic faces — those that happen to match the position and alignment of GAN faces.

Any forensic technique is, of course, vulnerable to attack. In this case, an adversary could crop, translate, and/or rotate a GAN-generated face in an attempt to circumvent our detection. We can, however, build this attack into our training by constructing our model not just on GAN faces, but also on GAN faces that have been subjected to random manipulations. (See the article in the Further Reading section below for more details on this attack and defense.)

No defense will ever be perfect, so the forensic identification cannot rely on just one technique. It must instead rely on a suite of techniques that examine different aspects of an image. Even if an adversary does not find a way to circumvent this particular detection, newer diffusion-based, text-to-image synthesis techniques (e.g., Dall-E, Midjourney, and Adobe Firefly) do not exhibit the same facial alignment as GAN-generated faces. As a result, they will fail to be detected by this specific technique. In my next post, I will describe a complementary forensic analysis that is more effective at detecting faces generated using diffusion-based techniques.

Further reading:

[1] S. Mundra, G.J.A. Porcile, S. Marvaniya, J.R. Verbus and H. Farid. Exposing GAN-Generated Profile Photos from Compact Embeddings. Workshop on Media Forensics at CVPR, 2023. 

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

How realistic are AI-generated faces?

In the latest installment of this CAI series, Hany Farid unpacks what was once the most common computational technique for synthesizing images. Learn about telltale clues and new fast-improving technology making it more difficult to distinguish real faces from those synthetically made.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the second article in a six-part series.

More from this series:

Read part 1: From the darkroom to generative AI


by Hany Farid

Before we dive into talking about specific analyses that can help distinguish the real from the fake, it is worthwhile to first ask: Just how photorealistic are AI-generated images? That is, upon only a visual inspection, can the average person tell the difference between a real image and an AI-generated image? Because the human brain is so finely sensitive to images of people and faces, we will focus on the photorealism of faces, arguably the most challenging task for AI-generation.

Before diffusion-based, text-to-image synthesis techniques like DALL-E, Midjourney, and Adobe Firefly splashed onto the screen, generative adversarial networks (GANs) were the most common computational technique for synthesizing images. These systems are generative, because they are tasked with generating an image; adversarial, because they pit two separate components (the generator and the discriminator) against each other; and networks, because the computational machinery underlying the generator and discriminator are neural networks. 

Eight GAN-generated faces. (Credit: Hany Farid)

Although there are many complex and intricate details in these systems, StyleGAN (and GANs in general) follow a fairly straightforward structure. When tasked with creating a synthesized face, the generator starts by splatting down a random array of pixels, and then it feeds this first guess to the discriminator. If the discriminator, equipped with a large database of real faces, can distinguish the generated image from the real faces, the discriminator provides this feedback to the generator. The generator then updates its initial guess and feeds this update to the discriminator in a second round. This process continues with the generator and discriminator competing in an adversarial game until an equilibrium is reached when the generator produces an image that the discriminator cannot distinguish from real faces. Above you can see a representative set of eight GAN-generated faces.

In a series of recent perceptual studies, we examined the ability of trained and untrained observers to distinguish between real and synthesized faces. The synthetic faces were generated using StyleGAN2, ensuring diversity across gender, race, and apparent age. For each synthesized face, a corresponding real face was matched in terms of gender, age, race, and overall appearance.

In the first study, 315 paid online participants were shown — one at a time — 128 faces, half of which were real. The participants were asked to classify each as either real or synthetic. The average accuracy on this task was 48.2%, close to chance performance of 50%, with participants equally as likely to say that a real face was synthetic as vice versa.

In a second study, 219 new participants were initially provided with a brief training on examples of specific rendering artifacts that can be used to identify synthetic faces. Throughout the experiment, participants were also provided with trial-by-trial feedback informing them whether their response was correct. This training and feedback led to a slight improvement in average accuracy, from 48.2% to 59.0%. 

While synthesized faces are highly realistic, there are some clues that can help distinguish them from real faces: 

  1. StyleGAN-synthesized faces have a common structure consisting of a mostly front-facing person from the neck up and with a mostly uniform or nondescript background.

  2. Facial asymmetries are telltale signs of synthetic faces. They are often seen in mismatched earrings or glasses frames. (See the top-left image in the figure below.)

  3. When viewing a face, we tend to focus most of our attention in a "Y" pattern, shifting our gaze between the eyes and mouth and often missing some glaring artifacts in the background that may contain physically implausible structures. (See the top-right image in the figure below.)

  4. StyleGAN-synthesized faces enforce a facial alignment in the training and synthesis steps that results in the spacing between the eyes being the same and the eyes being horizontally aligned in the image. (See the bottom image in the figure below.)

Eight GAN-generated faces (middle); a magnified view of the fifth face (top left) revealing asymmetric earrings; a magnified view of the sixth face (top right) revealing a stranger synthesis artifact on the shoulder; and an average of all eight faces (bottom) revealing that the eyes in each face are aligned to the same position and interocular distance. (Credit: Hany Farid)

Having largely passed through the uncanny valley, GAN-generated faces are nearly indistinguishable from real faces. Although we have not carried out formal studies, my intuition is that the more recent diffusion-based, text-to-image generation yields faces that are slightly less photorealistic. As this technology improves, however, it will surely lead to images that also pass through the uncanny valley.

Having seen that the visual system may not always be reliable in distinguishing the real from the fake, in my next post I will describe our first technique that exploits the facial alignment of GAN-generated faces to distinguish them from real faces.

Further reading:

[1] S.J. Nightingale and H. Farid. AI-Synthesized Faces are Indistinguishable from Real Faces and More Trustworthy. Proceedings of the National Academy of Sciences, 119(8), 2022. 

More from this series:

Read part 1: From the darkroom to generative AI

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Guest User Guest User

From the darkroom to generative AI

In this era of generative AI, how do we determine whether an image is human-created or machine-made? Read the first part in a series exploring techniques used for identifying synthetic media and how we regain trust in the visual record.

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record.

This is the first article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections

Part 6: Passive versus active photo forensics in the age of AI and social media


by Hany Farid

Stalin, Mao, Hitler, Mussolini, and many other dictators had photographs manipulated in an attempt to rewrite history. These men understood the power of photography: If they changed the visual record, they could potentially change history.

Soviet secret police official Nikolai Yezhov, pictured to the right of Joseph Stalin, was later removed from this photograph. (Credit: Fine Art Images/Heritage Images/Getty Images & AFP/GettyImages)

In the past, altering the historical record required cumbersome and time-consuming darkroom techniques. Starting in the early 2000s, powerful and low-cost digital technology began to make it easier to record and alter digital images. And today, sophisticated generative-AI software has fully democratized the ability to create compelling digital fakes.

Twenty-five years ago, I was waiting in line at the library when I noticed an enormous book in the return cart called The Federal Rules of Evidence. I thumbed through the book and came across Rule 1001 of Article X, which outlined the rules under which photographic evidence can be introduced in a court of law. The rules seemed straightforward until I read the definition of original

An “original” of a writing or recording means the writing or recording itself or any counterpart intended to have the same effect by the person who executed or issued it. For electronically stored information, “original” means any printout — or other output readable by sight — if it accurately reflects the information.

I was struck that the definition of original included this vague statement: “... or other output readable by sight.” 

At the time, digital cameras and digital editing software were still primitive by today’s standards, and generative AI was unimaginable. The trajectory of technology, however, was fairly clear, and it seemed to me that advances in the power and ubiquity of digital technology would eventually lead to complex issues around how we can trust digital media. As we enter the new age of generative AI, these issues have only become more salient.

Although it varies in form and creation, generative AI content (a.k.a. deepfakes) refers to images, audio, or video that has been automatically synthesized by an AI-based system. Deepfakes are the latest in a long line of techniques used to manipulate reality — from Stalin's darkroom to Photoshop to classic computer-generated renderings. However, their introduction poses new opportunities and risks now that everyone has access to what was historically the purview of a small number of sophisticated organizations.

Even in these early days of the AI revolution, we are seeing stunning advances in generative AI. The technology can create a realistic photo from a simple text prompt, clone a person's voice from a few minutes of an audio recording, and insert a person into a video to make them appear to be doing whatever the creator desires. We are also seeing real harms from this content in the form of non-consensual sexual imagery, small- to large-scale fraud, and disinformation campaigns. 

A photographic image (left), and AI-generated images generated by StyleGAN (middle) and Stable Diffusion (right). (Credit: Hany Farid)

Building on our earlier research in digital media forensics techniques, over the past few years my research group and I have turned our attention to this new breed of digital fakery. All our authentication techniques work in the absence of digital watermarks or signatures. Instead, they model the path of light through the entire image-creation process and quantify physical, geometric, and statistical regularities in images that are disrupted by the creation of a fake.

In this series of posts, I will describe a collection of different techniques that we use to determine whether an image is real or fake. I will explain the underlying analyses — including analyzing shadows, reflections, vanishing points, specular reflections in the eye, and AI-generated faces — and the conditions under which the analyses are suitable.

While these and related techniques can be powerful, I need to emphasize their limitations. 

First, it is important to understand that while the presence of evidence of manipulation can tell us something, the absence of traces of manipulation do not prove that an image is real. We cannot differentiate between a flawless fake and the real thing. 

Second, nearly all forensic techniques have a limited shelf life because techniques to manipulate content are continually improving. So while these techniques are applicable today, we have to continually be aware of how generative AI is evolving. 

Finally, these forensic techniques are only one part of a larger ecosystem needed to regain trust in the visual record. This includes efforts like the Content Authenticity Initiative (to which I contribute), some regulatory pressures, education, and the responsible deployment of AI technologies.

Further reading:
[1] H. Farid. Image Forensics. Annual Review of Vision Science, 5(1):549-573, 2019.

More from this series:

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections

Part 6: Passive versus active photo forensics in the age of AI and social media

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.

Read More
Coleen Jose Coleen Jose

Community story: on passion projects and nurturing trust

We spoke with fashion photographer Obidi Nzeribe about how to get started with a passion project and why creative transparency is now a crucial part of building trust with your community.

We chatted with fashion photographer Obidi Nzeribe about how to get started with a passion project and why creative transparency is now a crucial part of building trust with your community. Read the full interview and see images from Obidi’s summer project.

Read More
Coleen Jose Coleen Jose

Community showcase: creating an ecosystem for content transparency

See how Respeecher, SmartFrame, and the Guardian Project are building trust signals and Content Credentials into their services and products.

In this community event, we showcase Content Authenticity Initiative members co-creating an ecosystem for transparency across industry and civil society. See how they integrate trust signals into sites, apps or services using the CAI’s free open-source tools, which activate the industry standard for content authenticity and provenance.

What we cover

  • The creative workflow with synthetic speech company, Respeecher, indicating when an audio file was generated through their voice cloning tool and marketplace.

  • Enabling creator attribution and displaying provenance through content management and digital publishing with SmartFrame.

  • Preserving cultural heritage and turning civic action into verifiable data with the Guardian Project’s ProofMode app and platform.

  • Q&A

Speakers

  • Coleen Jose, Sr. Marketing and Engagement Manager, Content Authenticity Initiative, Adobe

  • Dmytro Bielievtsov, CTO and Co-founder, Respeecher

  • Nathan Freitas, Director and Founder, Guardian Project

  • Patrick Krupa, Chief Product Officer, SmartFrame Technologies

Read More
Coleen Jose Coleen Jose

Community event: authentic storytelling for the era of generative AI

How transparency tools and context meets the challenges and opportunities in generative technologies.

The astonishing accessibility and sophistication of generative AI tools is outpacing our objective understanding of where content comes from and forcing us to establish new norms. How do we distinguish realistic AI-generated content from journalistic work? How does transparency about generative AI benefit creators and consumers? How, and when should we add verifiable provenance to our media?

Our relationship with synthetic content—from social media face filters to visual effects in film and entertainment—is changing with generative AI’s power to enhance creativity but also amplify mis- and disinformation.

In this community event, the Content Authenticity Initiative presented a panel to discuss how transparency tools and context meets the challenges and opportunities in generative technologies.

Speakers

  • Andy Parsons, Sr. Director, Content Authenticity Initiative, Adobe

  • Claire Leibowicz, Head of AI and Media Integrity, Partnership on AI

  • Gene Kogan, Artist, Co-Founder, Eden

  • Tom Mason, CTO, Stability AI

More from the CAI

📬 Get news and updates

🌏 Join the CAI community

🤝 Connect with us on Discord

Read More
Andy Parsons Andy Parsons

Reaching Major Milestones with 1,000-Members, Content Credentials in Adobe Firefly and Much More 

This past month, the CAI achieved three major milestones representing the fast pace of progress toward our critical goal of making content more transparent and trustworthy, everywhere.

By Andy Parsons, Senior Director, Content Authenticity Initiative 

This past month, the Content Authenticity Initiative achieved three major milestones representing the fast pace of progress toward our critical goal of making content more transparent and trustworthy, everywhere.  

First, we are thrilled to announce that the CAI has crossed the 1,000-member mark! After launching the CAI just over 3 years ago, we are incredibly proud to have grown a global coalition of leading tech and media companies, camera manufacturers, news publishers, creative professionals, researchers, NGOs and many more – all contributing their expertise, code, and energy to advance this mission.  

This comes at a critical time as content becomes easier to manipulate and determining what’s authentic is challenging, to say the least. Specifically, generative AI has broken through as a creative technology for content inspiration and creation that is transforming the world. While AI has unprecedented potential to amplify human creativity, it also shines a light on the urgent need to restore trust online. 3 years ago, the CAI team and partners anticipated a future of powerful AI tools and eroding trust. And through hard work and deep collaboration, we are proud to see the day-by-day heightened call for authenticity at this key turning point in human history. Above all, we’re resolute in finding ways to deploy the techniques, tools, and standards that we have built together, so that everyone can benefit from them. 

Leveraging Content Credentials in Firefly Beta 

At Adobe Summit last month, Adobe’s new generative AI model, Firefly, was announced, and along with it our commitment to leveraging CAI’s Content Credentials to bring transparency to generative AI outputs. Every asset produced with Firefly has embedded a Content Credential indicating the model used and its version. This is significant—it not only builds on our mission to ensure tools like Firefly are used responsibly, but also gives viewers of this content important context to understand what they’re seeing or hearing, enabling them to make trust decisions when necessary.  

In addition, to ensure creators have more control over their work, they can use CAI provenance technology and attach “Do Not Train” credentials that travel with their content wherever it goes. With industry adoption, this will prevent AI developers from training on content with “Do Not Train” credentials. With this universal standard now published in the latest C2PA technical specification, we believe this will be adopted quickly across industry. 

Creative storyteller Hallease Narvaez showcased our Content Credentials features in the Firefly Beta at Summit, exhibiting an incredible display of how the CAI technology works with generative AI content – be sure to check out her video here.  

C2PA Releases Key Updates to the Technical Specification  

As the CAI and the non-profit standards development organization, the Coalition for Content Provenance and Authenticity (C2PA), continue to engage in deeply complimentary threads of work—the C2PA honing the blueprint for the future of digital provenance while the CAI builds a massive community to put the standard to work—we are also excited about the just-released version 1.3 of its technical specification that includes expanded support for an array of media types, as well as generative AI transparency capabilities. 

This is perhaps the largest update to the spec since its initial publication in 2021. Among the highlights are: 

  • Support for additional file types, including WAV for pro audio and WebP for image delivery. 

  • Extended metadata for media creation and modifications so AI-generated ingredients can be clearly captured and displayed. 

  • “Regions of interest” for explaining which parts of media were impacted by various actions, when appropriate.  

  • Universal support for generative AI transparency and creators’ “do not train” intent. 

I am so inspired and humbled by the recent accomplishments of the CAI ecosystem. I’m also reminded how much more work we all have ahead of us, to ensure digital provenance is available to journalists, creators, and consumers wherever and whenever they need tools for authentic storytelling.  

As always, if interested in our work, you can join our expansive and ever-growing community here.  

Join us and get started with these resources 

Read More
Coleen Jose Coleen Jose

Community event: get started with digital provenance tools

Developer demos and tips on how to get started with the CAI's free open-source tools, activating an open industry standard for content authenticity and provenance.

What are trust signals and how do we use them to enable authentic storytelling online? 

In this community event, we share developer demos and tips on how to get started with the CAI's free open-source tools, which activate an open industry standard for content authenticity and provenance. Whether you’re an individual or organization exploring the possibilities or ready to implement provenance into your product, app or service—select the video sections and tools most relevant for your project. 

What we cover

  • An introduction to the technical standard for certifying the source and provenance of media content

  • Tips and how to get started with the CAI’s free open-source tools

  • Community showcase: open-source in action and demos with CAI member, Pixelstream 

 
More from the CAI

📬 Get news and updates

🌏 Join the CAI community

🤝 Connect with us on Discord

Read More
Santiago Lyon Santiago Lyon

The CAI welcomes Canon!

Welcoming Canon as the newest member of the Content Authenticity Initiative.

Canon joins the CAI

by Santiago Lyon, Head of Advocacy and Education, CAI 

We are delighted to welcome Canon, one of the world’s leading camera manufacturers, as a member of the Content Authenticity Initiative (CAI) joining more than 890 media and tech companies, NGOs, and academics working together to fight misinformation by furthering the implementation of open-source digital provenance technology.  

Founded in 1937, Canon is a leading global manufacturer of both professional and consumer cameras and lenses and brings a long history of innovative and ground-breaking photojournalism projects to the CAI. Canon’s equipment is widely used by photojournalists and others to capture impactful, compelling and beautiful imagery from around the world. Their equipment has been used to capture award-winning images in recent years including Pulitzer Prizes and World Press Photo awards, among many others. 

“Canon enthusiastically supports efforts to fight misinformation by ensuring the authenticity and provenance of digital images that are created and enjoyed by society. Joining the efforts of the CAI is an important step in this endeavor,” said Go Tokura, Chief Executive of Imaging Communication Business Operations at Canon. “We are looking forward to working with other technology companies and media partners to develop technology solutions that achieve these goals." 

Provenance technology provides secure, tamper-evident metadata that accompanies images and other file types along their journey from capture through editing to publication, showing the viewer where an asset came from, and any changes made to it along the way. The CAI looks forward to working with Canon on prototyping and implementing provenance technology into their future products. 

Today, Canon also joins the Coalition for Content Provenance and Authenticity (C2PA) as its latest member that will provide creators, consumers, and others with opt-in, flexible ways to understand the authenticity and provenance across media types. 

We encourage diverse organizations and individuals to join the Content Authenticity Initiative to advance our efforts for digital provenance. You can find more information here

Read More