Executive summary

Our consultation series on generative AI and data protection covered five key areas:

The lawful basis for web scraping to train generative AI models.⁵
Purpose limitation in the generative AI lifecycle.⁶
Accuracy of training data and model outputs.⁷
Engineering individual rights into generative AI models.⁸
Allocating controllership across the generative AI supply chain.⁹

In total, we received 192 responses from organisations and 22 from members of the public. The majority of the responses came from the creative industries, trade or membership bodies, the technology sector (including ‘big tech’ firms) and law firms. In addition, we also held roundtable sessions with civil society, creative industries and technology firms.

Many of the responses were highly detailed. They often combined evidence (eg technical explanations of generative AI), analysis (eg interpretations of the application of data protection law) and arguments (eg the negative or positive effects of generative AI on a particular stakeholder).

We are grateful to those who responded to the consultation. We are also grateful to those who were willing to discuss these issues in further detail with us. In particular, thank you to the British Screen Forum, the Ada Lovelace Institute and TechUK for facilitating roundtable discussions with the creative sector, civil society and the technology sector respectively. We also thank colleagues at the French data protection authority, the CNIL, for sharing their insights on this issue.

Summary of positions after consultation

The following points set out our positions after reviewing the consultation responses:

We retained our position on purpose limitation,¹⁰ accuracy¹¹ and controllership.¹²

We updated our position on the legitimate interests lawful basis for web scraping to train generative AI models.¹³

We heard that data collection methods other than web scraping exist, which could potentially support the development of generative AI. For example, where publishers collect personal data directly from people and license this data in a transparent way. It is for developers to demonstrate the necessity of web scraping to develop generative AI. We will continue to engage with developers and generative AI researchers on the extent to which they can develop generative AI models without using web-scraped data.
Web scraping is a large-scale processing activity that often occurs without people being aware of it. This sort of invisible processing poses particular risks to people’s rights and freedoms. For example, if someone doesn’t know their data has been processed, they can’t exercise their information rights. We received minimal evidence on the availability of mitigation measures to address this risk. This means that, in practice, generative AI developers may struggle to demonstrate how their processing meets the requirements of the legitimate interests balancing test. As a first step, we expect generative AI developers to significantly improve their approach to transparency. For example, they could consider what measures they can provide to protect people’s rights, freedoms and interests. This could involve providing accessible and specific information that enables people and publishers to understand what personal data the developer has collected. We also expect them to test and review these measures.
We received evidence that some developers are using licences and Terms of Use (ToU) to ensure deployers are using their models in a compliant way. However, to provide this assurance, developers will need to demonstrate that these documents and agreements contain effective data protection requirements, and that these requirements are met.

We updated our position on engineering individual rights into generative AI models, as set out in the consultation chapter four.¹⁴

Organisations acting as controllers must design and build systems that implement the data protection principles effectively and integrate necessary safeguards into the processing. This would put organisations in a better place to comply with the requirement to facilitate people’s information rights.
Article 11 (on processing which does not require identification) may have some relevance in the context of generative AI. However, organisations relying on it need to demonstrate that their reliance is appropriate and justified. For example, they must demonstrate they are not able to identify people. They must also give people the opportunity to provide more information to enable identification.

5 Generative AI first call for evidence: The lawful basis for web scraping to train generative AI models

6 Generative AI second call for evidence: Purpose limitation in the generative AI lifecycle

7 Generative AI third call for evidence: accuracy of training data and model outputs

8 Generative AI fourth call for evidence: engineering individual rights into generative AI models

9 Generative AI fourth call for evidence: engineering individual rights into generative AI models

10 Generative AI second call for evidence: Purpose limitation in the generative AI lifecycle

11 Generative AI third call for evidence: accuracy of training data and model outputs

12 Generative AI fifth call for evidence: allocating controllership across the generative AI supply chain

13 Generative AI first call for evidence: The lawful basis for web scraping to train generative AI model

14 Generative AI fourth call for evidence: engineering individual rights into generative AI models