Skip to main content

Synthetic media and its identification and detection

Contents

Introduction

Synthetic media refers to content partially or entirely generated using AI or machine learning, including images, video and audio. It is already used in entertainment, advertising, and personalised content. 

Personal data is often used in the creation, distribution and targeting of synthetic media. This can include the use of biometric information used to train the AI models to generate realistic-looking people, or to insert or substitute individuals into media. 

As synthetic media technology has advanced, distinguishing deepfake content from real media has become increasingly difficult, creating a pressing need for reliable identification and detection methods. This chapter looks at what technological and social means could be used in future as the quality of synthetic media improves, to protect individuals and their data.

Protecting against malicious synthetic media and deepfakes

Deepfakes are among the most controversial applications of synthetic media. The Oxford English Dictionary defines a deepfake as “[a]ny of various media, [especially] a video, that has been digitally manipulated to replace one person’s likeness convincingly with that of another, often used maliciously to show someone doing something that he or she did not do.”

Malicious uses of synthetic media are designed to be convincing, so that a viewer might be fooled into believing that the events that they depict really happened. As the technology used to create synthetic media has advanced, synthetic media and deepfake content has become less obviously fake. This has created a need for methods to separate artificial content from real. 

Both the public and private sectors have seen initiatives to address this need, including the Home Office’s deepfake detection challenge (and in the US) the Tech Accord to Combat Deceptive Use of AI in 2024 Elections, which specifically addressed the use of deepfakes to create election disinformation.

The EU AI Act 97 requires providers of AI systems generating synthetic content to ensure outputs are marked as artificial in a detectable, machine-readable format. This places responsibility on creators to prevent misuse.

State of development

As the volume and sophistication of synthetic media and deepfakes is likely to increase in future, so must the measures used to detect them and malicious uses of them. 
Various methods might be used to allow individuals and organisations to determine if a piece of media is original and unaltered or has been generated or manipulated by AI.

Responsibility for transparency may be placed on the:

  • creator of the media; 
  • organisations providing the tools to create the media; 
  • organisations acting as intermediaries in the storage or sharing of that media. Creators of malicious synthetic media are unlikely to comply with schemes marking content as manipulated, placing greater responsibility on hosting platforms to flag or remove such content. In the future, the absence of verification may increasingly signal that content has been manipulated.

The following measures are being developed or in some cases are already on the market. To be effective, they will need to keep pace with the sophistication of synthetic media.

Certification of the provenance of media

Certification of provenance focuses on allowing creators or custodians of a piece of content to store information about it. This might include how the media was created and by whom. Another use might be showing whether it is original and unaltered or is artificially generated or manipulated.

Several large technology companies, as well as media and synthetic content creation organisations, are members of the Coalition for Content Provenance and Authenticity 98. This initiative provides a technical standard 99 that creators and publishers can use to certify their media, allowing individuals to establish whether what they’ve received has been altered.

Watermarking

Related to the idea of certification of provenance is the idea of ‘watermarking’ content. Adding a watermark to content to show it is synthetic works in a similar way to watermarking a still image. A signifier is inserted into the media, which can then be read to show its origin. In the US in 2023, the president signed an executive order requiring watermarking tools to be developed for government communications to protect citizens from disinformation. 

Current watermarking tools are reportedly vulnerable to tampering 100 and may degrade media quality 101, raising questions about their long-term efficacy 102

Systematic detection of synthetic media

AI – the technology that makes synthetic media and deepfakes possible – can also be used to analyse media and determine whether it has been generated or manipulated artificially.

Systematic or automated detection systems rely on inconsistencies or characteristic signs of editing in (for example) facial expressions or vocal patterns that might not be immediately apparent. This might mean looking for a mismatch in the shape of a mouth (the ‘viseme’) and the sound coming out of it (the ‘phoneme’) or picking out the boundary where the original material and the inserted material have been blended. Often, tests are run on the same piece of media to create an aggregated risk score.

If an automated system used to detect synthetic media itself processes personal information as part of the analysis, it will need to comply with data protection law.

Account reputation and behaviour-based measures

Account reputation-based filtering and moderation has been used in other platforms and for other media to try to limit spam and offensive content. These measures look not just at the content, but who is creating and sharing it. If those individuals or accounts are rated poorly for conduct or accuracy, content shared from them can be filtered out or otherwise limited. 

This sort of detection and intervention could be performed by ‘human fact-checkers’, for example, or platforms could use automated means to detect behaviours that looked suspicious.

Social measures and media literacy

As synthetic media becomes a larger portion of what is created and shared by individuals, we will need to understand the changing nature of our relationship with shared media. This may mean a fundamental shift in whether we trust received media by default or not. 

Education about how people can protect themselves from harms caused by malicious uses of synthetic media and deepfakes may complement the technical measures described in this chapter. An analogy is how people have been educated to recognise spam email. In future we may see media that is suspect treated in the same way as today we regard spam email promising riches. 

Regardless of the methods employed, the role of human identifiers, moderators and fact-checkers will probably expand with the need for an ‘expert eye’ to supplement detection systems. Humans would act to offer a second line of identification for anything that met a criterion of likely synthetic origin that was detected by automated means.

Fictional future scenario

Simran, a finance officer for a large construction company, receives an email from their company’s finance director, with an invoice attached. The invoice is for a large sum for an organisation Simran doesn’t recognise. 

The email seems genuine, but the company has previously given employees training about recognising cyber threats, so Simran decides to email the director to query the invoice. They receive an out-of-office reply from the finance director. However, the email with the invoice attached has a number to call about any issues. Simran calls it and the person at the other end sounds like the finance director. The director seems to be in a hurry to complete the call, though, and urges Simran to make the payment. 

Now suspicious, Simran asks for a video call to make sure everything is above board, and the finance director reluctantly agrees. When the video call comes in, again it appears to be the director speaking, and their voice seems normal. However, Simran’s company has invested in technology for its videoconferencing suite that detects synthetic media, and it alerts them to the high probability that both the video and voice have been manipulated. 

Satisfied, Simran terminates the call, reports the email as a ‘spear phishing’ scam, and emails their IT department to alert them to the scam.

Data protection and privacy implications

Where personal information is used in developing or using models to create synthetic media, data protection law applies. This is true even if the final synthetic media doesn’t contain any personal information. 

Personal information is also likely to be used in the identification and detection of synthetic media. Here, we focus on the data protection and privacy implications of such methods.

Certification of provenance and watermarking

Personal information may be processed in a provenance certification system or watermarking system – for example, if the identity of the creator of a piece of media is recorded, or their location data. 

Under these circumstances, organisations controlling or processing that personal data will need to be aware of their responsibilities under data protection law, including how to address rights requests made by individuals. These might include rectification of inaccurate data stored within the certification process or watermark, or erasure of personal data that has been recorded – which might affect the validity of the certification or watermark.

Systematic detection of synthetic media

Measures designed to identify and classify media as either original or manipulated by AI may be processing personal information as part of that identification process. 

Analysis of multimodal media in comparison to samples of known ‘real’ examples from individuals would be likely to be processing the personal data of those individuals during the comparison. For example, if a piece of video (which may have been altered) of a world leader is analysed, that information relates to an identifiable person, so the analysis is likely to be processing personal information.

Automated content moderation

 If automated moderation is used (regardless of the detection method), platforms must comply with Article 22 of the UK GDPR. This grants individuals rights regarding decisions made solely through automated processing where it has a legal, or similarly significantly life-affecting impact. The ICO has previously produced guidance for organisations considering using automated decision making for content moderation

Security

Deepfakes and synthetic media can be used to fool recipients into taking actions they would not normally take. Here, malicious synthetic media and deepfakes might be used to:

  • bypass authentication systems by using a deepfaked identity; 
  • persuade people to divulge personal information for criminal or malicious purposes including blackmail once trust has been gained;
  • convince people to make payments to scammers, such as authorised push payment fraud 103
  • disinform for political or ideological ends.

These outcomes might have additional data protection implications, including upholding the security of data stored. Organisations must use appropriate technical and organisational measures to secure the personal information they process 104. This would include protecting it from unauthorised access or damage by those using deepfaked identities. 

Recommendations and next steps

As technology advances and capabilities improve, so too will the quality and ease of creation of synthetic media. While this may lead to a greater adoption in the arts and culture, entertainment, and education, malicious uses of synthetic media are also likely to remain present or increase. This will make the ability to identify and detect where media has been manipulated increasingly necessary if people are to be protected. Therefore, those who build or maintain systems that identify or detect synthetic media will need to ensure they do so in a manner compliant with data protection law (if those systems are processing personal data). Building in data protection by design to systems designed to watermark, certify or analyse content or its distribution will ensure that users are confident their personal data is being handled appropriately.

To better help controllers who are or will be using synthetic media identification and detection, the ICO will:

  • develop our understanding of the processing of personal data within synthetic media and its detection/identification, including as a subset of our work on generative AI;
  • work with other regulators (including through the DRCF) to build our knowledge of the effects of synthetic media and its detection, and identify areas of critical regulatory intersection; and
  • continue to engage with the public, academia, interest groups and industry about synthetic media and its identification and detection.

The ICO is committed to enabling responsible innovation and supporting organisations that are thinking of developing synthetic media identification and detection systems. Our innovation services aim to help innovators bring privacy-respecting products and services to market.