Passive versus active photo forensics in the age of AI and social media

In this era of generative AI, how do we determine whether an image is human-created or machine-made? For the Content Authenticity Initiative (CAI), Professor Hany Farid shares the techniques used for identifying real and synthetic images — including analyzing shadows, reflections, vanishing points, environmental lighting, and GAN-generated faces. A world-renowned expert in the field of misinformation, disinformation, and digital forensics as well as an advisor to the CAI, Professor Farid explores the limits of these techniques and their part in a larger ecosystem needed to regain trust in the visual record. 

This is the final article in a six-part series.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections

 


 

by Hany Farid

In the past few posts, I have described representative examples of passive forensic techniques. This category of tools for distinguishing the real from the fake assumes no explicit knowledge of the content source or specialized recording equipment. 

The benefit of these techniques is that they are applicable to a broad category of content. The drawback, however, is they sometimes require manual intervention. The shadow and reflection analysis, for example, requires identifying corresponding points on an object and shadow or reflection. While this is a relatively simple step for a human analyst, it is still out of reach of automation. As a result, these techniques cannot be applied to the torrent of daily online uploads.

Active forensic techniques, on the other hand, operate at the source of creation, embedding into or extracting from an identifying digital watermark or signature. There is a long history of marking content to prove authenticity, indicate ownership, and protect against counterfeiting. For example, Getty Images, a massive image archive, adds a visible watermark to all digital images in their catalog. This allows customers to freely browse images while protecting Getty’s assets. 

An “invisible” or steganographic watermark can be added to a digital image by, for example, tweaking every 10th image pixel so that its color (typically a number in the range of 0 to 255) is even-valued. Because this adjustment is so minor, the watermark is imperceptible. And, because this periodic pattern is unlikely to occur naturally, it can be used to verify an image’s provenance.

The ideal watermark is one that is imperceptible and also resilient to basic manipulations like cropping or resizing. Although the above pixel manipulation is not resilient to, for example, image resizing, many robust watermarking strategies have been proposed that are resilient — though not impervious — to attempts to remove them.

The benefit of watermarks is that identifying information is directly associated with a piece of content before it is released into the world, making identification fast and easy. The drawback to this approach is that watermarks are vulnerable to attack, where an adversary can digitally remove the watermark while leaving the underlying content largely intact.

Therefore, in addition to embedding watermarks, a creator can extract an identifying fingerprint from the content and store it in a secure centralized ledger. (This fingerprint is sometimes called a perceptual hash — see article 2 in Further Reading below.) The provenance of a piece of content can then be determined by comparing the fingerprint of an image or video that is out in the world to the fingerprint stored in the ledger. Both watermarks and fingerprints can be made cryptographically secure, making it very difficult to forge.

This type of watermarking and fingerprinting is equally effective for actively tracking AI-generated and human-recorded content. The benefit of this approach is that it can efficiently scan the billions of online daily uploads. The drawback is that it requires specialized hardware or software to be used at the point of creation and recording. It also requires downstream platforms like YouTube, Twitter, and TikTok to respect the watermarks, and it requires browsers to display the accompanying content labels. Although not perfect, these combined technologies make it significantly harder to create a compelling fake and significantly easier to verify the integrity of real content.

Importantly, this active approach is agnostic as to the truthfulness or value of content. The goal of this approach is simply, but critically, to distinguish between human-recorded and AI-generated content.

A mockup of a social media feed that integrates Content Credentials viewing and interaction. Content Credentials activate the C2PA standard for content authenticity and provenance.

I started this series by talking about my initial journey some 25 years ago into the world of media authentication and forensics, and so I'll end with some thoughts of what may lie ahead. 

A combination of many factors has contributed to a complex and at times chaotic online information ecosystem. With democratized access to powerful editing and creation software, the widespread deployment of generative AI is poised to significantly complicate this environment. Harms ranging from the creation of non-consensual sexual imagery to small- and large-scale fraud and disinformation have already emerged even in these early days of this AI revolution. 

Legitimate concerns have also been raised regarding creators' rights and their very livelihoods. Here we can draw on some historical analogs. The introduction of photography did not — as some 19th-century artists feared — destroy art or artists. It disrupted some aspects like portrait paintings, but it also liberated artists from realism, eventually giving rise to Impressionism and the Modern Art movements. 

Similarly, the digitization of music did not kill music or musicians, but it did fundamentally change the way we produce and listen to music, and it spawned new genres of music. This history suggests that generative AI does not necessarily have to lead to the demise of creators, but that it can, with proper care and thought, be a new enabling medium. 

There is much to be excited about during these early days of the AI revolution. At the same time, there are legitimate and real concerns as to how these new technologies are being weaponized against individuals, societies, and democracies, and how they might disrupt large sections of the workforce. Advances in AI do not need to lead us into an unrecognizable dystopian future. But, technologists, corporate leaders, and regulators need to carefully consider how they develop and deploy technologies, and build in proper safeguards from the point of conception through rollout.

Further reading:

[1] H. Farid. Watermarking ChatGPT, DALL-E and Other Generative AIs Could Help Protect Against Fraud and Misinformation, The Conversation, March 27, 2023.

[2] H. Farid. An Overview of Perceptual Hashing. Journal of Online Trust and Safety, 1(1), 2021.

[3] Z. Epstein, et al. Art and the Science of Generative AI. Science, 380(6650):1110-1111, 2023.

More from this series:

Part 1: From the darkroom to generative AI

Part 2: How realistic are AI-generated faces?

Part 3: Photo forensics for AI-generated faces

Part 4: Photo forensics from lighting environments

Part 5: Photo forensics from lighting shadows and reflections

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is a community of media and tech companies, non-profits, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors.