Engineering individual rights into generative AI models

In brief: Our position on controllers’ response to people’s information rights remains largely the same. See the original call for evidence for the full analysis.

The consultation enabled us to refine it in the following way:

Organisations must design and build systems that implement the data protection principles effectively and integrate necessary safeguards into the processing. This would, in turn, put organisations in a better place to comply with the requirement to facilitate people’s information rights.
Even though we referenced article 11 in this chapter, organisations should not view it as a way to avoid obligations. Organisations need to demonstrate that any reliance on article 11 is appropriate and justified in the circumstances. They must still give people the opportunity to provide more information to enable the identification of their data.

Respondents

In May 2024, we published the fourth chapter of our consultation series. This set out our policy position on engineering information rights into generative AI models.

We received 28 responses from organisations, with three respondents identifying as a member of the public. Six responses came via our survey, with a further 22 received directly via email. The most represented sectors were:

creative industries (11);
industry groups (four); and
the technology, insurance and civil society sectors (all three).

Of the survey respondents, three (50%) agreed with our initial analysis. There was clear agreement about the importance of information rights. However, there was both disagreement and a lack of evidence, particularly from technology firms, about how to ensure information rights are engineered into generative AI.

Original call for evidence

In our original call for evidence,⁵¹ we set out several positions on information rights and generative AI. To recap, the key positions were as follows:

Firstly, when generative AI developers and deployers are controllers, they need to show that they have clear and effective processes for enabling people to exercise their rights over their personal data. This applies whether their data is contained in the training, fine-tuning or output data, but also the model itself.

Secondly, developers and deployers need to evidence how they are making sure people have meaningful, concise and easily accessible information about the use of their personal data.

Thirdly, developers and deployers need to clearly justify the use of any exemptions and demonstrate how they are safeguarding people’s interests, rights and freedoms.

Finally, we covered article 11. We said that if developers argue they cannot respond to requests because they cannot identify individuals (in the training data or anywhere else), the law requires them to explain this to the requester and demonstrate why this is the case.

Key points from the responses

In terms of generative AI developers and deployers needing to show they have a clear and effective process for enabling people to exercise their rights, the following points arose in the responses:

Creative industry respondents felt the development of generative AI was not respecting the information rights of creators. They argued that generative AI developers are chiefly responsible for ensuring information rights are being exercised.
Several respondents, particularly generative AI developers and industry bodies, consistently argued that is difficult to facilitate people’s information rights once data is ingested and compiled into a training dataset and is used to train a generative AI model. One large generative AI developer argued that deployers should be mainly responsible for facilitating information rights.
Generative AI developers and industry respondents mainly argued that certain measures, such as retraining a model to erase the influence of personal data, would be impractical or not technically feasible. They instead argued that issues such as rectification and erasure should be exercised at the application level, particularly via the use of output filters.
Civil society respondents argued that if generative AI developers cannot uphold information rights, then their development and deployment is unlawful. They said that the non-compliant models must be retrained on compliant data. They argued that non-compliant models cannot exist just because some claim they are innovative. They also said that while they accept that new technologies require new ways of meeting information rights, untested and unproven methods are not acceptable.

On the need for generative AI developers and deployers to evidence how they are making sure people have meaningful, concise and easily accessible information about the use of their personal data, the following key points arose:

Generative AI developers and industry bodies argued that it would be disproportionate to inform every person about processing web-scraped data. They reference the exemption at article 14(5)(b) of the UK GDPR. This provision provides an exception from controllers’ obligations under the right to be informed when receiving personal data from a source other than the individual, if providing this information proves impossible or would involve disproportionate effort; or would likely render impossible or seriously impair the processing’s objectives.⁵²
However, civil society groups and the creative industries argued that web-scraping is invisible processing. They argued that this processing cannot meet people’s reasonable expectations, even taking into account article 14(5)(b).
Generative AI developers and industry bodies argued that they meet transparency and notice requirements through public notices generally explaining that they are using publicly-accessible data. Some also argued that broad categories of source data, such as ‘publicly-accessible information’ provide appropriate levels of transparency, as opposed to exhaustive lists of sources.

On the need for generative AI developers and deployers to clearly justify the use of any exemptions and demonstrate how they are safeguarding people’s interests, rights and freedoms, the following key points arose in the responses:

Tech industry trade bodies raised article 11 and argued that if a developer cannot identify who the data in a model relates to, it is not personal data.
One law firm argued that data minimisation is important. They said that precise collection criteria and excluding certain sources play a key role in safeguarding rights and freedoms.
This was linked to another point made by many respondents, that most people are ill-equipped to access information about the processing or understand the technicalities. Organisations should not rely on them to find the correct information and exercise their rights.
We received numerous arguments about machine unlearning.⁵³ These mainly pointed to its theoretical application and not any current practical usage.
The technology sector and industry groups strongly emphasised input and output filters as a suitable safeguard. Civil society expressed strong doubts about the effectiveness of these filters, citing the ease of jailbreaking through prompts injection.⁵⁴

Our response

We welcome the agreement from all respondents about the importance of ensuring information rights in the context of generative AI. It is vital that, across the generative AI lifecycle, organisations have processes in place to enable and record people exercising their information rights. However, we did not receive clear and verifiable evidence from generative AI developers or the wider industry about the practical measures that could enable people to exercise their rights.

Data protection by design and by default is a legal requirement. This means that, when organisations develop generative AI systems, they must adopt appropriate measures to protect people’s rights from the outset. We are increasingly concerned that many organisations developing and deploying generative AI models and systems do not have measures in place to effectively respond to people’s information rights requests, particularly where those requests concern web-scraped personal data. In the absence of effective tools to comply with people’s requests, and depending on the organisation’s lawful basis, their processing may be unlawful.

Not all information rights are absolute. For example, if an organisation is relying on legitimate interests as the lawful basis and there remains an overriding legitimate interest in continuing the processing (which also passes the three-part test), then the right to erasure would not apply.⁵⁵ However, organisations must consider rights requests on a case-by-case basis. There may be some cases where they do not have compelling legitimate grounds which override the individual’s rights.

Many respondents mentioned output filters as a useful tool for implementing information rights. However, these may not be sufficient, as they do not actually remove the data from the model.

Organisations must therefore have mechanisms in place to fulfil information rights⁵⁶ requests for both the training data and, if a model contains personal data, the trained model itself. The controller is accountable for complying with people’s information rights. If a developer and a deployer are joint controllers, they must determine which of them is best positioned to respond to information rights requests.

We expect generative AI developers and deployers to substantially improve how they fulfil their transparency obligations towards people, in a way that is meaningful rather than a token gesture. Testing both whether the measures in place actually work, as well as new and more innovative solutions, can help organisations to comply with the transparency principle. It may help them with the principles of lawfulness and purpose limitation. It will also enable people to obtain meaningful information about the processing so that they can exercise their information rights. We will continue to engage with stakeholders on promoting effective transparency measures, without shying away from taking action when our regulatory expectations are ignored.

Finally, controllers should not apply article 11 so broadly that it has the effect of undermining people’s rights. To rely on article 11, controllers would need to establish that they cannot identify a person. Controllers must assess their ability to identify a person on a case-by-case basis. In circumstances where a controller is unable to identify a person, they should inform the person and offer easy ways for the person to provide additional information. This may enable the organisation to identify that person’s personal data. In addition, article 11 serves to guard against unnecessary retention of data, in line with the data minimisation principle.

51 Generative AI fourth call for evidence: engineering individual rights into generative AI models

52 Exemptions

53 See glossary and A Survey of Machine Unlearning

54 Here jailbreaking refers to techniques used to bypass the input and output filters that restrict or control what the AI model can process and produce. When an entity “jailbreaks" a generative AI system, they exploit weaknesses in these filters, often by crafting specific prompts or inputs that cause the AI to ignore or circumvent its constraints. This can result in the AI providing responses that it would otherwise block, such as sensitive information, prohibited content, or outputs that could be potentially unsafe.

55 This exemption would not apply to the right to rectification, provided the individual has the required evidence that the data in question is inaccurate.

56 It should be noted that controllers need effective tools that enable them to respond to all information rights requests, not only the right to erasure.