The ICO exists to empower you through information.

In addition to the impacts identified in Table 1, we also asked respondents for further information on their selected response and evidence on how impacts could be quantified. This section discusses the feedback that was received across the five calls for evidence where respondents provided further information. 

Lawful basis 

As shown in Table 1, the first call for evidence received the highest level of engagement, with around 50 respondents. The majority of respondents agreed with the benefits we identified in Table 1, although only 10 respondents provided further clarification. Out of those respondents who provided further information, one noted that: 

“the…approach will benefit our organisation and other responsible developers of generative AI. The (ICO’s) approach validates our core belief that only developers…who respect IP and privacy rights should be able to benefit from the development and deployment of generative AI models.”

Another respondent argued that a potential benefit would mean an improved ability to define organisational strategy and approach to the use or adoption of generative AI models. 

Several respondents highlighted the importance of regulatory certainty and the impact this can have on organisations’ willingness to invest. One respondent commented that: 

“the lower the certainty an organisation can enjoy, the less likely they are to enter that market.”

Four different organisations responding to the first call for evidence identified the following costs:

  • mitigating against unauthorised scraping;
  • establishing legitimate interests on personal data used in training;
  • assessing consent and other activities associated with data protection impact assessments; and
  • deployers of generative AI models in obtaining transparency information around the sourcing of training data from third party developers.

Elaborating further on potential costs, one respondent stated that: 

“We consider it inappropriate to require the developer to impose downstream controls, which would have a fundamental chilling effect on any development of AI that is open source or open innovation in the UK.”

Another respondent felt that the proposals were likely to result in additional time costs for organisations implementing the ICO’s approach. 

“We also believe that organisations, data controllers and processors, will struggle to understand and apply the ICO’s approach, resulting in time and resources costs for them.”

One organisation highlighted the importance of web-scraped data for model efficacy. They made a number of suggestions around how to quantify impacts. 

“Estimating the benefits of utilising web-scraped data for training generative AI models involves considering several factors: 

Firstly, leveraging publicly accessible data through web scraping enables a more extensive and diverse dataset, enhancing the model's ability to generalise and generate realistic outputs. This broad dataset contributes to the model's accuracy and performance during training, fine-tuning, and post-deployment phases.

The benefits can be quantified by assessing the efficiency gains in model development, as large-scale web scraping provides a substantial volume of data crucial for training generative AI models effectively. Additionally, the diverse data sources contribute to the model's adaptability across various applications, potentially expanding its market usability.

Furthermore, the potential for innovation and competitive advantage arises from the improved capabilities of generative AI models trained on comprehensive datasets. This may lead to the development of cutting-edge applications, enhancing the organisation's product or service offerings.

To calculate these benefits, one can consider the reduction in data acquisition costs compared to alternative methods, the increased efficiency in model development, and the potential revenue growth resulting from the superior performance of generative AI models. It's essential to weigh these benefits against the legal and ethical considerations outlined in the ICO consultation, ensuring compliance and responsible use of web-scraped data.”

Other responses noted that model development had already reached a stage where “the genie is out of the bottle”, given the prevalence of web-scraped data in existing frontier models. They said that the need to detail specific processing purposes at each stage of the AI lifecycle may “inhibit firms’ ability to develop and deploy Gen-AI in practice”. 

Despite the overall balance of responses indicating a net anticipated benefit, more detailed feedback suggests that, for developers, the impact of a more limited use of web-scraped data is likely to be on balance a net-cost. However, it is challenging to draw conclusions for the market as a whole due to the limited sample of survey responses. 

Purpose limitation

With 17 responses there was limited engagement on the impact considerations of the second call for evidence. Only two respondents elaborated further on potential impacts, beyond those set out in Table 1. They noted that the approach may result in additional benefits and costs.  

One respondent who highlighted an additional benefit suggested that it will increase their confidence in risk management. However, another respondent which highlighted an additional cost suggested:

“A key cost may be reduced ability to innovate with general purpose Gen-AI models…due to incompatibility with tightly defined (and potentially contractually imposed) purposes higher up the lifecycle.”

None of the respondents were able to provide a cost or supporting evidence on quantifying the impacts.

While the net anticipated impact of this call for evidence was positive,63 it is challenging to draw conclusions on the impact for the market as a whole given the limited levels of engagement. 

Accuracy 

As with the previous call for evidence, there was limited engagement with only nine respondents answering the impact-related questions on whether the approach would result in costs or benefits. Only three respondents provided further information beyond the impacts identified in Table 1. 

One respondent, representing the creative industries, highlighted: 

“one of the benefits our members would incur is a more equal playing field, when being forced to compete with synthetic outputs.”

Another respondent highlighted the potential cost impact of information sharing between developers and deployers of generative AI. 

“measuring accuracy and addressing any risks or challenges requires detailed information sharing and cooperation between developers and deployers. It is not yet clear whether the market will evolve in a way that will facilitate such an approach. This will have an impact on the costs of implementing Gen-AI use cases.”

Most respondents could not provide an estimate for the impacts. However, one organisation suggested that the benefits of the approach could be estimated by: 

“considering market demand for compliant solutions along with cost savings from avoiding fines and a potential competitive advantage.” 

While the overall balance of respondents suggests a net positive impact, the impact evidence received is inconclusive, given the small sample size and representativeness of respondents.  

Individual rights 

With 11 responses received there was limited engagement on the impact considerations of the call for evidence on individual rights. While seven respondents agreed that there are likely to be costs associated with people exercising their information rights, only one respondent elaborated further on the cost implications of compliance. 

“A resource cost exists already for Data Subjects exercising their Rights. Additional FTE costs incurred through the introduction and production of Policies, Procedures, Forums and other associated Framework construction."

Overall the impact evidence on this call for evidence is inconclusive given the small number of responses received. 

Controllership 

Like other calls for evidence, there was limited engagement on the impact of the proposals on controllership, with only eight responses received. When asked about the impact64 of the proposals on controllership: 

  • One respondent thought they would have a major impact on their organisation. 
  • Two respondents thought there would be a moderate impact.
  • Three respondents thought the impact would be minimal.
  • Two respondents thought there would be no impact.

Of these, only two provided further information. 

Respondents that indicated the proposed regulatory guidance would have a major or moderate impact highlighted: 

“Depending on the regulatory approach adopted, the concepts of controllers and processors will be defined accordingly and consequently their responsibilities and liabilities.” 

“if the understanding of when a developer/provider of AI is to be considered as (joint) controller was to be broadened, this may significantly increase the obligations to comply with data protection law and constitute an additional obstacle to making AI services available on the market.”

While the responses received indicate that the proposals are, on balance, a net-cost to organisations, it is challenging to draw conclusions for the market as a whole due to the limited sample of survey responses.

 


63 On the basis of the absolute number of respondents that answered the approach would result in costs or benefits for their organisation

64 Respondents were asked what scale of impact the proposals would have on their organisation and prompted to provide further details. The controllership call for evidence did not ask whether impacts would be positive or negative.