Article

C2PA: An innovative approach to mitigate the harms of synthetic content

By Kaushal Rathi, Harry Keir Hughes

11 Sep, 2024
7 min read

Insights

The rise in the development of advanced AI systems like generative AI has proportionally increased the rise in deepfakes, and controversial, emotive misinformation and disinformation that impact business.
There is a growing need for synthetic content detection and provenance tracking to tackle these issues.
Currently there are different tools and techniques that can be used, including watermarking and steganography. However, these methods are not effective enough in preventing misinformation.
After conducting exhaustive research, the Coalition for Content Provenance and Authenticity (C2PA) has emerged as an effective solution.
C2PA is evolving rapidly and is being widely adopted. Major companies and government bodies are incorporating and recommending this technology.

Since ChatGPT’s launch, generative AI has rapidly expanded in applications, business use cases, and research. However, despite its benefits, the technology is vulnerable to bad actors.

Fraud has always been an issue, thanks to attackers using existing tools to create misinformation. The rise of generative AI tools has led to a significant tenfold increase in deepfakes across industries between 2022 and 2023, with fake content designed to mislead.

The risks to businesses from deepfakes are many, including falling victim to misinformation and disinformation, as well as reputational damage and financial loss, thanks to deepfake-fueled fraud.

Deepfakes, and the tools to create them, are widespread: deepfake images of Taylor Swift garnered 45 million views before eventually being removed. OpenAI has developed a tool, Voice Engine, which can convincingly clone any voice from a 15-second clip. It has not yet released it, recognizing “serious risks to the public.”

According to the Economist, 1,400 security experts told the World Economic Forum this year that disinformation and misinformation are the “biggest global risks in the next two years, even more dangerous than war, extreme weather, or inflation.”

No wonder that organizations including the BBC, the UK national broadcaster, are using new verification techniques to help audiences distinguish between authentic and fake content, showing them where the media is coming from. At the BBC, digital signatures linked to provenance information ensure that when media is validated, the person or computer reading the image can confirm it came from the BBC. For the past three years, the BBC’s R&D team and other organizations have made strides in this arena by using a set of guidelines, standards, and specifications from the C2PA coalition, an initiative led by Adobe to advance data provenance.

What are some existing solutions?

Lack of data/content provenance and misinformation due to synthetic content and deepfakes have existed long before the generative AI boom, and solutions to mitigate them have been researched extensively. Some industry solutions include steganography (concealing messages or information within other non-secret text or data); watermarking and fingerprinting (creating a unique identifier id or hash value for specific media content). Lastly, the use of AI models to classify between synthetic and original content and assets across all modalities is fairly recent and is being researched and used comprehensively too.

However, these solutions have drawbacks and are only effective up to a certain point. For instance, expanding these technologies across all modalities reliably is difficult. Watermarks can be erased or distorted; steganography and watermarking embed only a limited amount of information about the asset; and fingerprinting can only assist in creating a unique identifier for tamper detection. Hence, these technologies are utilized for detecting media alteration, for creating tags that indicate IP ownership, but they do not display any information about the exact changes made.

What is C2PA and how does it work?

C2PA is a coalition of members who have come together to draft technical standards and specifications for a tool to combat misinformation through provenance tracking of media assets. Although C2PA shares some members with the Content Authenticity Initiative (CAI), the two groups are separate entities with different goals. The C2PA is only responsible for drafting and improving technical specifications, whereas CAI is a community of media and tech companies, NGOs, academics, and others working to promote the adoption of an open industry standard for content authenticity and provenance. CAI is an Adobe-led cross-industry consortium that advocates for the adoption of content credentials based on C2PA standards. Additionally, CAI develops and maintains open-source software based on C2PA technical specifications. Currently the CAI C2PA tool is available in RUST, JavaScript, C/C++, Python and Node.js. In a nutshell, C2PA develops the end-to-end open technical standards while CAI drives the adoption of these standards and creates open-source tools to fight misinformation.

The idea of C2PA is centered on maintaining a store of manifests (see Figure 1). In simple terms, a manifest is a text file attached to the multimedia asset. This manifest includes details such as assertions (modifications made to the asset), claim, and claim signature (digital signature), among others. The manifest can also include information on how the asset was made, including whether it was made by AI. Additionally, unlike fingerprinting, manifests are sequential, and users can see the history of alterations to the asset, whereas a fingerprint or hash is different for each version, with no indication of previous versions. Fingerprinting helps with tamper detection of the asset, but C2PA can be used to view all the previous manifests with information about changes to the asset, who changed it, and when.

Every user has a different manifest associated with them, and the user is responsible for adding the relevant assertions to the manifest after modifying the asset. As a result, the provenance information is preserved during the lifetime of the asset.

Figure 1. The manifest store bonded with the asset, forming the fundamental core of C2PA

Source: Infosys

The manifests can be viewed either through a command line interface using the C2PA tool, or graphically using the verify tool, meaning that all the details related to the provenance of the asset since its creation can be viewed at a glance.

Why should companies adopt this strategy?

C2PA has major advantages as a tool:

C2PA is compatible with a wide range of formats across multimedia, including image, video, audio, and document. More formats are set to be included in the future to broaden compatibility with future media.
C2PA employs several security measures like hashing and digital signing. It allows different hashing algorithms like SHA2-256, 382 and 512. It supports X.509 certificates as a digital signature which makes the asset tamper-evident.
C2PA manifest data can be stored on distributed ledger technology (blockchain) in any external repository, or embedded in the asset itself. The Numbers Protocol – a decentralized provenance standard – is doing this already, and C2PA provides various provisions, describes methodologies, and allows for such DLT implementations in different variations.
There are action labels provided by C2PA specifications that can be used to indicate different assertions clearly and assist in organizing the manifest.
Different parties have started to adopt and implement the CAI durable content credentials mechanism, which is a combination of C2PA watermarking, fingerprinting, and metadata. The idea is that in isolation, these strategies have their flaws, but in combination, they can help create a robust provenance storing mechanism.

High-profile partners are coming on board

OpenAI has joined the C2PA steering committee, while the National Institute of Standards and Technology (NIST) public drafts for responsible AI development have also mentioned the C2PA framework for content authentication. Along with the BBC, many major companies, namely Google, TikTok, Sony, Leica, and Qualcomm, have started incorporating this technology in their products and services. Infosys has also announced its membership in drafting C2PA guidelines.

“Partnering with C2PA in our responsible AI strategy ensures our commitment to authenticity and ethical use of content. This helps promote trust and accountability in all aspects of digital content,” says Syed Ahmed, head of Infosys Responsible AI Office.

Prospects of C2PA in a world of deepfakes

Over 90% of online content is predicted to be synthetically generated by 2026. Crimes related to deepfakes, misinformation, and disinformation are also expected to increase. In 2022, 26% of smaller business and 38% of large companies faced deepfake fraud attacks, resulting in losses up to $480,000 in that year alone.

Tools like C2PA can help tag synthetic data immediately as well as display the provenance information, which helps alert consumers.

However, C2PA has limitations. For example, C2PA metadata can be easily stripped out of the asset.

Some of these limitations have already been recognized and are being actively worked on, for instance, in the work of durable content credentials, which harmonize C2PA metadata with fingerprinting and watermarking. This combines the strengths of watermarking, fingerprinting, and embedded metadata techniques, which together help overcome the limitations of each one individually. The credentials specify a new kind of signed metadata, specify measures to make the metadata durable, and persist in the face of removal attacks.

C2PA is likely to become more robust, reliable, universally used, and accepted. This will happen piecemeal as businesses increase their efforts with the implementation of durable content credentials mechanisms; government endorsement and support; agreements and collaborations among multiple organizations; C2PA-compatible applications; resolution of bugs, gaps, and interoperability issues; and lastly, wide industry adoption.

“While building an AI solution, it is essential to know where the data came from and how it will be used. Data provenance, which tracks the origins and changes of data, is essential in building a transparent and trustworthy AI system”, adds Ahmed. “C2PA is creating standards for tracking and verifying the history of digital content and will enable us to build AI solutions responsibly.”

Businesses should adopt this technology early and build it into their responsible AI strategy. Development, testing, knowledge sharing, and other talent contribution will help C2PA become a mature, and widely accepted and used tool in responsible AI.

Authors

Kaushal Rathi, Harry Keir Hughes