Approach

The CAI is designing components and drafting standards specifications for a simple, extensible and distributed media provenance solution.

approach - updated@2x.jpg

Introduction

With the increasing velocity of digital content and the democratization of powerful creation and editing techniques, robust content attribution is critical to ensure transparency, understanding, and ultimately, trust. 

We are witnessing extraordinary challenges to trust in media. As social platforms amplify the reach and influence of certain content via ever more complex and opaque algorithms, mis-attributed and mis-contextualized content spreads quickly. Whether inadvertent misinformation or deliberate deception via disinformation, collectively inauthentic content is on the rise. 

Currently, creators who wish to include metadata about their work (for example authorship) cannot do so in a secure, tamper-evident and standard way across platforms. Without this attribution information, publishers and consumers lack critical context for determining the authenticity of media. This is especially true for users of creative tools that enable augmenting reality with AI or even authoring fully synthetic content who need to be empowered to use their tools responsibly. 

Ultimately, the solution to the problem of inauthentic content and the erosion of trust it causes will rely on efforts in three distinct areas: 

First is detection of deliberately deceptive media. Through a combination of algorithmic identification and human-centered verification of intentionally misleading content the amount of inauthentic content can be reduced. However, as techniques for creating misleading content become more sophisticated and accessible we foresee an escalating arms race impeding progress on this front. As malicious purveyors of content become faster and better, detection techniques will struggle to keep pace. 

Second, education is essential. Well-intentioned creators and consumers will need to understand the danger of disinformation and the use of techniques to eradicate it. They must also understand ways to use sophisticated creative tools responsibly. These are skills that must be learned and passed on through media literacy campaigns and formal education. We must all understand why and when to trust what we see, hear and read. And we must be equipped with the tools and knowledge to do so. 

Finally, we must consider content attribution, which is the focus of this paper. Often referred to as provenance, attribution empowers content creators and editors, regardless of their geographic location or degree of access to technology, to disclose information about who created or changed an asset, what was changed and how it was changed. While detection can help address the problem of trust in media reactively by identifying content suspected to be deceptive, attribution proactively adds a layer of transparency so consumers can be informed in their decisions. Content with attribution exposes indicators of authenticity so that consumers can have awareness of who has altered content and what exactly has been changed. This ability to provide content attribution for creators, publishers and consumers is essential to engender trust online. 

At the same time, it is critically important that those same content creators be able to protect their privacy when necessary. Any solution attempting to restore trust must be globally viable across technology contexts and minimize opportunities to cause unintended harms or risks. It must also have freedom of creative expression in media production at its core. 

We seek to address the issue of content authenticity at scale. To accomplish this, we propose an open, extensible approach for content attribution and have begun working toward establishing standards with broad, cross-industry collaboration. 

 

Background

At Adobe MAX 2019, the Content Authenticity Initiative (CAI) was announced by Adobe in partnership with The New York Times Company and Twitter. Since that time, this group has collaborated with a wide set of representatives from commercial organizations (software tools, publishers, social media), human rights organizations and academic research to produce this paper and the approach it describes.

 

Our Mission 

The initial mission of the CAI is to develop the industry standard for content attribution. By augmenting subjective judgments about authenticity with objective facts about how a piece of content came to be, the CAI aims to help content consumers make more informed decisions about what to trust. 

Today, most attribution information is embedded in the metadata of assets via long- established standards such as EXIF and XMP. However, most assets appear on the Web without this information intact. Content moderators, fact-checkers and end-users alike are left to reconstruct context through imperfect and inefficient methods. We will provide a layer of robust, tamper-evident attribution and history data built upon XMP, Schema.org and other metadata standards that goes far beyond common uses today. This attribution information will be bound to the assets it describes, which will in turn reduce friction for creators sharing the attribution data and enable intuitive experiences for consumers who use the information to help them decide what to trust. 

We balance simplicity in use of the system with security against tampering and strong links to identity. Identity can be that of an individual, where prudent, or that of the trusted cryptographic signing entity. 

There is currently no universal approach for storing attribution data appropriate for all use cases. Depending on the systems involved, this information may be large enough to make it impractical to embed in a file containing digital content (hereafter called an asset). Conversely, some creators may have privacy concerns such that no data associated with an asset nor the asset itself can be stored on servers in the cloud. A cloud-based system may provide durability, whereas a file-based workflow optimizes for disconnected workflows and the preservation of anonymity. Therefore, the CAI imagines data storage to comprise a continuum of options ranging from file-based to cloud-based, with hybrid approaches in between. Flexibility for applications in implementing persistence and flexibility for end users to choose where their data is stored is essential for widespread adoption. 

Increasing trust in media requires the ongoing engagement of diverse communities. The CAI does not prescribe a unified single platform for authenticity, but instead presents a set of standards that can be used to create and reveal attribution and history for images, documents, time-based media (video, audio) and streaming content. Although the initial implementations will focus on imagery, our intent is to specify a largely uniform method for enabling attribution from various points of view through which diverse stakeholders can build decentralized knowledge graphs about the trustworthiness of media.