Artificial Intelligence

Insights into Coalition for Content Provenance and Authenticity (C2PA)

There is an emerging need for techniques that could prove useful for detecting synthetic content and tracking content modifications in general. Among various methods such as watermarking, AI detection and steganography, Coalition for Content Provenance and Authenticity (C2PA) is a novel technique which can be used for tracking asset provenance. This whitepaper explores different aspects of the C2PA provenance tracking method and how it can be used to distinguish between original content and heavily edited or AI generated content.

Insights

  • Due to the rise in development of sophisticated AI systems there has been a comparative rise in cases for deepfakes, misinformation etc. Due to multiple benefits of C2PA it has taken one of the top spots for universal content provenance tracking among several other techniques.
  • This white paper examines C2PA method in detail and provides reader with the information of functioning of C2PA, implementation, advantages, potential areas of improvement and future work. The white paper also includes a reference use case for consumption.

Introduction

The Content Authenticity Initiative (CAI) is a cross-industry coalition leading the global effort to address digital misinformation and content authenticity [1]. The coalition includes industry giants such as Adobe, TruePic, Microsoft, ARM, Intel and many more [2]. C2PA is coalition of members responsible for drafting set of specifications and standards (called ‘C2PA technical specifications’ interchangeably used as C2PA) for content authenticity by provenance tracking while CAI is the coalition that builds the tool or system for these specifications and more. To elucidate, these 2 are separate entities even though they share some common members. C2PA is only responsible for updating the standards and expects correct implementations from entities like CAI for their specifications. These set of standards and specifications have now been actuated using c2patool (GitHub repository) supported by the CAI [3]. However, it must be noted that as of drafting this paper, CAI claims that c2patool is an early pre-release and may have bugs and unimplemented features. The libraries are available in RUST, C/C++, Python and Node.js. For our research and analysis, we used the Python version.

Implementation

C2PA uses some technical terms. A simple breakdown of few important terms is given below:

  • Manifest: A small database attached to the asset which contains information about provenances such as claims, assertions, details of actor and such.
  • Actor: A non-human entity (hardware or software) that is participating in C2PA ecosystem. (There is a contradiction in the definition as of authoring this paper, the version 2.0 states that actors should only be non-human entities [4] while the definition of actor mentioned in glossary states both human and non-human [5]. For the sake of this paper, we will consider the latter definition.)
  • Signer: The actor whose credential’s private key is used to sign a claim on an asset.
  • Asset: The media file in question which could be an image, video, document, or audio.
  • Assertions: A data structure which represents a statement asserted by an actor concerning the asset. This data is a part of the C2PA manifest.
  • Claim: A digital signature which references assertions, redaction effects and other information necessary to represent content binding.

More terms and their elaborate descriptions can be found on the C2PA specifications website [5]. The concept of C2PA is centered on maintaining a manifest store associated with the asset. The manifest store consists of manifests which are cryptographically signed by the actors over the lifetime of the asset. The ‘Active manifest’ is the latest manifest in the manifest store [Fig 1].

Fig 1. The structure of manifest store bonded with the asset which forms the fundamental core of C2PA. The overview diagram hosted on the C2PA specification website also provides a clear depiction [6].

Fig 1. The structure of manifest store bonded with the asset which forms the fundamental core of C2PA. The overview diagram hosted on the C2PA specification website also provides a clear depiction [6].

There are 2 main actions performed on the manifest:

  • Reading:
    Reading of the existing manifests can be done using the c2patool library or any other. It can also be done graphically for simpler interactions [Fig 2]. The content credentials organization provides an online tool for this purpose [7]. The functioning of this tool and further explanation is given on their website [8]. All of this collectively provides an understanding of the future vision of implementation of C2PA across multiple software domains such as apps and websites.
  • Signing:
    Signing on a new manifest requires that the actor must initially generate cryptographic keys following the digital signature they want to use. Software’s like OpenSSL can be used to generate such keys. Once the keys are generated, the actor can then simply invoke ‘sign_file’ method with proper arguments to complete the signing process. Once the file is signed, it can then be sent or uploaded for further transmission.

Fig 2. Graphical reading of C2PA manifest using the CAI verify tool [7]. The image used here was generated by ChatGPT.

Fig 2. Graphical reading of C2PA manifest using the CAI verify tool [7]. The image used here was generated by ChatGPT

Real-Life use case

In early March 2024, the official ‘princeandprincessofwales’ Instagram account posted a photograph of Kate Middleton along with her kids. This photo was immediately subjected to scrutiny since it had some obvious and subtle edits and flaws, showing it was a fake photograph [9]. Following this incident, there were multiple notifications issued by mainstream media outlets to remove the altered source images and discontinue the article(s). The original post now does not exist on Instagram.

To understand the potential and impact of the C2PA tool in avoiding misinformation, consider the late case of Kate Middleton. The below assumptions are made for this case:

  • Every editing software is C2PA enabled and automatic assertions are added into the manifest and bonded with the asset before dispatching.
  • Every viewing application such as Instagram, X (formerly known as Twitter) and websites are also C2PA enabled.

Following the edits made by the source, these assertions would have been automatically incorporated into the manifest by the editing software before posting on media channels. These would have then been visible to consumers at an easy glance as depicted on CAI website [8] or as shown in [Fig 2], as opposed to the real-life scenario, where it was first noticed by the professionals.

Additionally, this would have alerted the media channels and other consumers to understand that the source image was heavily altered as well as the details of the alterations, before reposting it to other channels. It would have also helped consumers realize that this is an instance of disinformation and should be disregarded at once. Lastly, if this system was already prevalent, then the consumers would have had an opportunity to independently decide the authenticity of the image.

Advantages of C2PA

  • Security:

    C2PA security includes of 2 main components:
    • Hashing: Hashing is a process of converting any input such as a file into a fixed-length string of characters [10]. One of the widely used hashing methods is SHA2-256 where 256 shows that the output result of the encryption is 256 bits in length, regardless of input character length. In C2PA framework, this is performed over the content and metadata of the digital assets and the hashed data is then stored in the C2PA metadata section [11][12]. Currently, C2PA allows SHA2-256, SHA2-384, and SHA2-512 methods [11]. The hashing for different format assets is conducted in a different manner. For example, general box is used for non-BMFF-based box format [13].
    • Digital Signatures: Following the hashing process, digital signing is performed. Specifically, C2PA leverages X.509 certificates to create digital signature of the asset’s manifest using CA (certificate authority) like DigiCert. These digital signatures can also be used for detection of manifest tampering. Finally, this information is again stored in the manifest itself. C2PA supports a wide variety of signing algorithms [14].
  • Time stamping feature:

    Another key factor that adds substantial security to the authentication process is time stamping which is a subset of the digital signing process. During the creation of X.509 certificate or the signing of an asset, the time of signing is verified with a TSA (time-stamp authority) server. This helps resolve a considerable amount of provenance detection scenarios like distinguishing between the original image and falsely claimed original image. In version 2.0 of C2PA specifications only X.509 certificates may be used for signing.
  • Support for distributed ledger technology (DLT):

    C2PA specifications 1.0 describe how DLT, or any other external manifest repository can be used to store manifests. One of their design goals also mentions, “Do not require cloud storage but allow for it.” [15]. C2PA does not recommend implementing a DLT based system since data stored on DLT is immutable. Hence, if an actor uploads some personal information, it could lead to data leakage and the redaction mechanism of C2PA would also not prove useful [16]. Instead, C2PA mentions a common use case where a hash of a C2PA manifest, or other cryptographic proof, may be stored immutably within a DLT. This may be used to prove that the C2PA manifest has not been altered or removed [16]. Regardless, the main point of focus is the general capability of C2PA to be able to support external manifest repository which opens avenues for multiple innovative and robust ways for C2PA implementation with enhanced security. The numbers protocol has already developed such a framework of decentralized provenance system [17].
  • Manifest control with labels:

    The schema-based organization of the data in C2PA proves extremely beneficial in navigating the manifests and it also provides another small layer of security by rejecting addition of random fields which could be used maliciously by bad actors. The C2PA also provides mechanisms for transferring Exif (Exchangeable Image File Format) and IPTC (International Press Telecommunications Council) metadata as assertions, asset reference, asset type, using pre-defined fields and labels [18].
    Recently with recent changes in 2.0 version only specific fields can be incorporated from IPTC and Exif [18]. C2PA thus provides flexibility with diverse types of assertions which help add more details about the asset.
  • Adoption rate:

    So far C2PA has been incorporated in DALL·E 3 model and Google and TikTok have also agreed to integrate this tool with their technologies. Apart from software and media-based companies, hardware companies such as Qualcomm, Leica, Sony and Intel are merging C2PA natively in chipsets, cameras and other products [19][20][21]. C2PA is already being adopted at a fast pace by large corporations and more companies are expected to join this momentum leading to a revolutionary change. Lastly OpenAI has now joined C2PA steering committee which is a significant achievement [22].
  • Durable content credentials:

    With latest developments in C2PA they have now introduced the concept of durable content credentials which essentially states that the combination of multiple strategies such as watermarking, fingerprinting, web3 decentralized based storage etc. can lead to an extremely robust provenance storing mechanism since in isolation these strategies have their respective flaws but combined together, they nullify each other’s flaws [23]. It must be recognized that implementation of this concept does not exist yet.

Areas of Improvement

It must be noted that we do give benefit of the doubt to C2PA, since there are lot of unknown variables, knowledge that is hidden to preserve confidentiality, constant state of development, unprecedented scenarios and other good reasons as to why C2PA functioning, and structure has been designed the way it exists today. As mentioned previously the CAI (developer of C2PA tool) and the C2PA specifications are 2 separate entities and perform different functions and responsibilities, however this distinction does not play a role from the consumers perspective.

Hence the responsibility for each area of improvement for either of the entities is left to the reader’s opinion and these evaluation points are oriented towards to the tool and specifications both simultaneously.

  • Size of C2PA metadata

    An asset during its lifetime can get compressed, edited, cropped etc. several times. Hence, we implemented an experiment in which we could explore the asset size difference before and after signing of 3 manifests. We downloaded a random image from X [24] which was in JPG format initially and represents a real-life data point. We then converted this image to other formats like png, tif, svg, etc. to cancel out any bias. This also led to a change in the original size of the asset. We followed the same procedure for audio and video formats. For pdf and gif, we had to use different assets due to the nature of the formats and conversion was not possible. Finally, for manifest we used the standard provided template on CAI website [25] and signed using ps256 algorithm. The following results were generated [Table 1] [Fig 3]:

    Table 1: The above table shows the asset format, original size and size added to the asset due to 1 manifest.

    Format type Original Size (KB) Change in size (KB) after signing 1 manifest.
    jpg 82 105 ± 1
    png 525 106 ± 1
    svg 68 19 ± 0.1
    webp 58.5 14 ± 0.4
    tif 1907 15 ± 0.1
    pdf 84.6 0
    gif 339 0
    flac 404 0
    mp3 157 14 ± 1
    wav 1767 15 ± 1
    avi 725 14
    mp4 721 15
    Fig 3: Size of manifest added for each signing for every format.

    Fig 3: Size of manifest added for each signing for every format
    Fig 4: Bar graph being the original asset size and the added manifest.

    Fig 4: Bar graph being the original asset size and the added manifest
    It can be seen clearly that the size of the manifest store in assets such as jpg and png is significantly higher than the asset size [Fig 4]. This is because the embedding of manifest for every format type is conducted in a different manner. Nevertheless, if this experiment is scaled to a higher number of manifests, for example, 1000 manifests, even smaller asset size change formats like webp would consume higher memory overall. In this case it would be:
    58.5 KB + (14 KB * 1000) = 14.0585 MB
    It can be understood that for the same scaling for jpg and png cases, the memory requirement would be substantially higher since a single manifest itself consumes ~100KB. We also noticed in our experiment that longer manifests further have slightly higher consumption of memory as compared to the template manifest used in the experiment. In conclusion, C2PA consumes significant memory, hence the scalability design goal could stand as one of the biggest challenges to be overcome in future. It should be emphasized that the specifications do mention that the manifests can be stored in compressed ‘brob’ box format however this functionality is not existent in the tool yet and therefore not evaluated.
    The complexity of this issue is further compounded by the fact that unlike EXIF metadata max size, which is restricted to 64KB in jpeg images, the C2PA specifications do not mention the maximum size limit. Conversely, C2PA cannot restrict the size limit since this would disable addition of tracking information after a point which defeats the purpose of provenance tracking in C2PA.
    In the far future, this problem still might persist even if it is chosen to store these manifests in an external repository including DLT since these records need to be maintained indefinitely which could burden the infrastructure and costs.
  • Embedding of manifests and file format issues

    While C2PA currently supports a wide range of formats across media types such as png, pdf, avi etc., some classic universally used image formats such as bmp, xcf and other media type formats such as xls, csv etc. are not yet supported. One of the design goals of C2PA is to support all standard formats across media types such as images, videos, audio and documents. However, the future and feasibility of this goal is still unknown. Every format is different and does not support every other metadata format. For example, png cannot embed Exif metadata unlike jpg. This is also the reason image format converters (websites) simply strip away the metadata. Hence making C2PA compliant with every standard across all media types is a challenging issue.
    Nevertheless, giving the benefit of the doubt and if C2PA does develop support for various formats, there exists a bigger problem of media format conversion. This issue is crucial since not all websites support all asset formats. For example, the popular website ‘Medium’ does not accept ‘.webp’ images which is a standard image format type issued by OpenAI DALL-E model. Thus, the need to convert before uploading.
    C2PA needs to develop native software solutions which can separate the manifest, convert the asset format, and embed the manifest again with automatic relevant assertions added and a new final signing on it. Currently, adobe photoshop performs a similar process where they copy the entire bytes of manifest store and add new manifest post conversion of asset correctly.
    It can prove extremely challenging for companies to develop their own C2PA type tool since cross media type conversion scenarios forming of derived assets, composed assets may also need to be considered. For instance, assume that a frame is saved in png format from a C2PA enabled composed asset video and then converted to pdf. Another case would be where a word document, or pdf gets added with images whose manifests now need to be combined (if not then at the least referenced properly) with the pdf manifest as well as while downloading an image from the same document, the correct manifest associated with the image should be stripped automatically and the image should then be downloaded along with its manifest.
    If the image is edited inside the pdf, then the manifest management related processes can become extremely convoluted. These scenarios could result in collaboration among multiple organizations which could further burden them with added costs, delay in product and services launch etc. Hence there exists a need for native software support since not all companies could afford to divert their resources towards resolving this issue with their own version of C2PA implementation. Our research concluded that C2PA do not indicate any plan of development for such functionalities in their tool.
  • Loopholes and bad actors’ strategies

    Due to the nature of the C2PA tool, there is a heavy reliance on actors following the specifications and not producing any threats. It is not necessary that in practicality this is applicable especially due to lack of supervision on the content being filled into the manifest as well as automation. This leads to a network of major challenges which can be caused by bad actors as follows:
    • Alternative metadata reliance issue: C2PA specifications provide mechanisms and labels to port data from other metadata like exif, tiff, photoshop etc. Even though the timestamping is one of the strongest pillars of proving provenance, it could make weak in legal situations since indirectly it could be challenged that the data existent in the other metadata formats is true in its nature while the one in C2PA is not. For example, consider a legally escalated situation where the true provenance between 2 image assets needs to be determined. Naturally, one might assume that simply choosing the asset whose C2PA manifest has an earlier timestamp than the other should be considered as the original asset. However, a bad actor could manipulate the 'DateCreated’ tag’s entry value with a false timestamp in photoshop metadata and can use this to port into C2PA manifest with the label tag of ‘photoshop: DateCreated’ (or any relevant combination of other metadata and its associated tag) [18]. They can then legally argue that the asset in question is originally theirs since the date of creation precedes the timestamp of the original asset of the opposition party and that since C2PA allows for this feature, it could be considered as valid time of creation. Simply put it is not necessary that the timestamp on a manifest shows the actual time of creation and after the ownership. This legal situation would be especially relevant to the scenarios of porting of legacy assets and making them C2PA enabled.
      The intellectual property related cases are less severe on a personal level to human beings, but cases which involve victims to prove their innocence against false accusations and forgeries are substantially more serious. Moreover, this situation could worsen for parties which are not technically apt and thus it would be extremely challenging to prove the original provenance of the asset.
    • Redaction mechanism and faulty assertions filling: The redaction mechanism was created to help actors redact assertions partially or completely [26]. However, there lies a double-edged sword. A bad actor may use redaction or avoid adding assertions or add incorrect assertion and develop controversial emotion inciting assets by spread misinformation/disinformation with no legal accountability since the identity of the bad actor could be unknown This is because one of the C2PA design goals is to maintain privacy [27] and therefore addition of personal identifiable information (PII) is not necessary in C2PA. Additionally, the actors can also choose to insert some generalized ‘issuer’ name in the claim signature or even forge some other entities name.
    • Verisimilar manifests: It is not necessary that the durable content credentials mechanism will make the system fool proof. Covert and non-covert watermarking’s can be removed, changing only one pixel value will change the fingerprint, developing surrealistic and faulty metadata is also easy. All these strategies can be used by bad actors or careless actors to forge and create duplicates. While these can be proved faulty in court, however most of these would still end up for the consumers eyes. Additionally, they can also choose to store these duplicates on multiple repositories/DLTs which could confuse the consumers as to the true provenance of the asset. This might also be the result of only normal usage and not simply due to deliberate malicious intentions.
    • Real life example of potential misinformation: Based on the blog by hackerfactor [28], in March 2024, British Broadcasting Company (BBC), changed the manifest of their own video of Haiti violence 4 times [29] (3 times documented on a blog and the latest 4th is currently live on their website [28][29]) and updated the video too at least one time. The BBC video comprises of 2 videos and some filler images with multiple edits. Although, there has been some explanation presented about the sources of the 2 videos, the lack of accurate assertions in the C2PA metadata is the major contributor of misinformation (out of context video) in this case. The actual assertions should have shown details of edits performed on the 2 videos such as animations, removal of audio, merging of 2 videos, time stamps of each video cropping, addition of and details related to the location filler image used and other such details. Unfortunately, not only this information is lacking but also there is a blue BBC verify checkmark (watermark on the video) which creates an illusion of guarantee of provenance but is false. Another point of note is that although correctly updating of the manifests rectifies the problem of true provenance, the dissemination of the misinformation occurs in the initial days of posting itself due to large viewership and trending relevance of the topic. Hence rectification post this event may not be effective in reinforming the viewers with the correct information.
      At the time of writing this paper, we did not have access to the earlier published versions of C2PA manifests and the BBC videos, apart from the information published on the blog source [28], but we assume there was no intention to spread misinformation in this case. However, it does acknowledge the need to have a more robust process to manage the manifest.
  • Virus delivery security risk

    Most cybercrime attacks involve a user downloading malicious malware in the form of files. In a few cases like, the attack can also begin in 2 steps. The first step is to deliver the malicious code onto user’s device followed by the second step which is to run an application that can execute that code. This step is much harder since the user needs to be tricked to download that application.
    Though the C2PA’s safety and encryption mechanism is a strong suite, a bad actor can easily add malicious code and send the asset further since there is no limit to the number of characters being added in the label entries. This code can then travel efficiently with the safety of the encryption mechanism. In cases of direct malware download like Trojan attacks, the metadata associated with an asset is generally not scanned by many standard antivirus software’s and even if they are scanned, the malware lines of code could appear as normal text and bypass the antivirus security check.
    Alternatively, in situations like trojan attacks if the asset is received back to the bad actor after circulation, they may be able to retrieve the issuers information because the asset has checkpointed on their systems, and any other PII would be an additional bonus to identify and target those specific users for further steps of the attack. Removal of PII is not an efficient solution since some information would need to be incorporated on some level to understand the provenance of the asset and hold accountability in legal situations. However, this then becomes a paradoxical situation and, now poses a huge security risk.
  • Sociotechnical issues

    As mentioned in problem 3 different strategies could be employed bad actors. Hence in a future case scenario there are multiple sociotechnical risks that could arise and burden the consumers:
    • Information overload: An asset consisting of long manifests could consume time and brain power in deriving the true provenance and concluding the true meaning behind the asset. For instance, an image having tens of subtle edits would make the consumer spend some time understanding the asset as well as they would have to spend some time reasoning as to why some entity would make a certain edit.
      A common consequence of information overload which in this case would become a risk is that the consumers might end up ignoring the asset and its provenance since it is easier to do so than navigating the manifests.
    • False provenance acceptance: In a scenario where a duplicate like a screenshot of an original asset was uploaded to another repository as opposed to the original asset one, would break the chain of provenance and consumers being exposed to this asset would then assume the wrong provenance. Bad actors could effectively use this strategy too.
    • Distrust in the provenance: A consequence of the above duplicate problem would be when a consumer comes across 2 different provenance chains of the same asset. This can confuse the consumer and create a sense of distrust.
    • Long manifests problem: If an asset has a thorough list of manifests, then there is a risk that the consumer can end up rejecting the provenance of the asset due to information overload problem mentioned above. Alternatively, the opposite could also be true where the consumer ends up accepting the long chain of provenance where some manifests could have forged by bad actors and these manifests appear true. Lastly there is also a risk that the consumers could be indecisive in these situations and end up rejecting the asset provenance since it is easier to do so even though the provenance was true.
    • Placid and normality: Repeated exposures to the above scenarios could lead the consumers to start ignoring the provenance of the asset which could make disbelief in the entire system.
  • Timestamp

    While using a TSA can prove useful for provenance tracking, the downside to this is that offline devices such as professional cameras cannot use this service. Hence, the timestamping feature in this case solely needs to rely on the system clock which can easily be changed by a bad actor leading to false manifest generation. It is noteworthy that the way companies like Sony and Leica are introducing trustworthy and reliable time stamping is unknown.
  • Stripping of the entire metadata

    By simply uploading C2PA enabled images to social media sites such as Reddit or different image format converter sites, all the metadata can be easily stripped away. This is also equivalent to capturing a screenshot of the original image. All provenance information can thus be easily lost before sending to different parties and duplicates can also be created. The mitigation effect of durable content credentials mechanism for this issue is still to be seen.
  • Interoperability issues

    Since the C2PA specifications implementation is left to organizations it opens avenues for faulty implementations. This leads to generation of bugs and lapses between multiple implementations which is common. In an experiment we conducted we found that a ‘numbersprotocol’ library [31] and CAI C2PA tool library both use ‘c2ptool’ as program initiator in command line interface (CLI) which caused a conflict and gave an error. While this issue is small, the bigger risk associated with multi-implementations is that bad actors can use this to their advantage to create illusionistic falsified manifests or arbitrary manifests as well as malicious features like skipping earlier manifests validation and continue signing new manifests. They could deliberately create a long manifest chain to effectively cause information overload and make the users ignore the asset and its provenance.

Future work

To create an ideal world reliable synthetic content detection and provenance tracking system using CAI C2PA tool, we believe following features would be needed from C2PA and other entities:

  • All editing software and other applications should automatically add assertions into the manifest and applications which do not follow C2PA must be blocked and removed at once from app stores and other locations. Given this, manual editing of assertions should be available as a choice, but all applications should have automatic assertions feature to a considerable extent. They should also only use CAI C2PA tool for final signing.
  • C2PA binding should be incorporated into the system OS natively such that after immediate creation of an asset, a manifest is generated automatically.
  • PII need not be included in the manifest for privacy reasons. However, the device id or hardware id should be added, and this must be done by C2PA automatically without user approval.
  • Media type conversion websites should use C2PA software (provided C2PA resolves problem 2)
  • Durable content detection principles need to be adopted to maximize the strength and only a single common DLT-based system needs to be used for tracking and storing manifests (Storing on the hardware would prove impractical, but it can be an option. DLT, however, needs to be mandatory).
  • An intermediate authority for checking the contents inside the manifest needs to be set up before publishing on the DLT. This could be verified by human or AI like large language models (LLMs).
  • Native OS support needs to exist for a feature, where, if an asset is generated offline, then the asset should not leave the device until the manifest is created / uploaded on the DLT. Concurrently, for offline devices such as cameras, while C2PA manifest can be generated using system time clock of the camera, a new manifest need to be signed immediately by devices to which this asset is being ported to using CA and until then forwarding of that asset should be prohibited.
  • A feature for detection of 3rd party tool used for signing should be established by the CAI C2PA tool or the relevant responsible organization. If it is detected that the signing was done by another tool, then the asset should be discarded. This feature should also be shared with social media and other major platforms.
  • Websites such as social media and other forms of distribution channels should reject assets which do not have any C2PA metadata associated with them.
  • Only trusted CAs are to be used for signing and the list needs to be supported by CAI or some other authority.
  • Soft fingerprinting can be used to track duplicates on the DLT systems, and these duplicates can be policed accordingly.

[The infrastructure costs, CA subscription cost and any other are not considered while saying the above] We believe that such heavy management is critical to support the true provenance of assets and create a robust & reliable system, otherwise C2PA would simply prove to be just another metadata format with some security-based features.

Infosys collaboration with C2PA

Infosys has committed to Responsible AI by collaborating with C2PA[32]. As machine-generated content increases, strong provenance tracking becomes crucial. Continuous improvement is essential, encompassing both the standard itself and compatible tools. Infosys, with its commitment to Responsible AI (RAI) and ethical governance, leverages its Responsible AI Office expertise to collaborate with C2PA members, ensuring a mature and reliable content provenance standard benefits the entire ecosystem and public.

Conclusion

Even though technical, scalability and other issues are important to address, the C2PA method mainly proves useful when it is reliable and secure. Unfortunately, there are several loopholes that exist today and may exist in the distant future, which can be used to break C2PA method. We however believe that C2PA is a novel idea and shows enormous potential. It needs more time to mature and adapt. With further updates and development of the C2PA method, proper research and implementation techniques, it might become a great standard to adhere to.

References and Citations

It should be noted that the research for this paper was conducted in early February 2024 with the available information at the point in time. Throughout the preparation of this whitepaper, information and insights were drawn from a range of reputable sources. Some of the key references that informed the content of this whitepaper include:

  1. https://contentauthenticity.org/faq
  2. https://c2pa.org/
  3. https://github.com/contentauth/c2patool
  4. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_version_history
  5. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_introductory_terms
  6. https://c2pa.org/specifications/specifications/2.0/specs/_images/Overview_Diagram.svg
  7. https://contentcredentials.org/verify
  8. https://contentcredentials.org/
  9. https://www.vox.com/culture/24098724/kate-middleton-editing-photo-explained
  10. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
  11. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_hashing_2
  12. https://c2pa.org/specifications/specifications/1.3/ai-ml/ai_ml.html#_ai_ml_model_content_credential
  13. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_general_boxes_hash
  14. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_digital_signatures
  15. https://c2pa.org/specifications/specifications/1.0/specs/C2PA_Specification.html#_design_goals
  16. https://c2pa.org/specifications/specifications/1.3/guidance/Guidance.html#_distributed_ledger_technology
  17. https://docs.numbersprotocol.io/
  18. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_partially_supported_schemas
  19. https://www.qualcomm.com/news/onq/2024/04/shaping-the-future-of-ai-responsibly
  20. https://www.sony.com/content/sony/en/en_us/SCA/company-news/press-releases/sony-electronics/2024/sony-electronics-delivers-firmware-updates-including-c2pa-compli.html
  21. https://leica-camera.com/en-US/photography/content-credentials
  22. https://c2pa.org/post/openai_pr/
  23. https://contentauthenticity.org/blog/durable-content-credentials
  24. https://twitter.com/botsdontcry1/status/1765285019646308374
  25. https://opensource.contentauthenticity.org/docs/c2pa-python/#creating-a-manifest-json-definition-file
  26. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_redaction_of_assertions
  27. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_design_goals
  28. https://www.hackerfactor.com/blog/index.php?/archives/1024-IEEE,-BBC,-and-C2PA.html
  29. https://www.bbc.com/news/world-latin-america-68462851
  30. https://contentcredentials.org/verify?source=https://d2zo1lns8kb6p9.cloudfront.net/newslabs/origin/trial-01/cps/live-ugc-cps-68462851-bfd5a341-d141-4616-aeaa-b1d9be7da043.mp4
  31. https://github.com/numbersprotocol/numbers-c2pa
  32. https://www.infosys.com/newsroom/features/2024/collaborates-two-global-initiatives-c2pa-aiga.html

Authors

Kaushal Rathi - Infosys Responsible AI Office

Senior Associate Analyst - Data Science

Sathyanarayana Sampath Kumar - Infosys Responsible AI Office

Senior Data Scientist

Mandanna A N - Infosys Responsible AI Office

Principal - Enterprise Applications