The ICO exists to empower you through information.

This section is aimed at helping technical specialists and those in compliance-focused roles to understand how statistical accuracy is linked to fairness.

Control measure: There are appropriate and effective measures in place to ensure that where appropriate datasets are relied on for determining statistical accuracy, they are accurately and fairly labelled, and do not cause detriment to people.

Risk: If datasets are not accurately labelled, then any statistical accuracy derived from those datasets cannot be relied on. This may result in processing activities and outcomes which are unfair on people and may breach UK GDPR articles 5(1)(a) and 9.

Ways to meet our expectations:

  • Document and implement information management processes that detail the data labelling requirements and steps.
  • Train staff who are responsible for determining the labelling of datasets appropriately to ensure the quality and accuracy of labels set.
  • Include quality assurance (QA) procedures into the labelling process, including:
    • 'human in the loop' QA processes for any automated labelling processes; and
    • security and QA measures, if you use a third-party data labelling service.
  • Include labelling for blind spots and biases and ensure all labelling testing includes an analysis of 'edge cases' (rare and unusual situations which do not happen often). This is so these are not excluded, missed or misinterpreted.
  • Ensure the least amount of information is used in the process (eg adding data incrementally into the data labelled sets).
  • Ensure that a resolution process is in place if disagreements on labelling for edge cases occur.
  • Document all decisions about features that won't be labelled.
  • Keep data labels under review.

Options to consider:

  • Ensure the labelled information used to train the AI is based on a statistically representative sample, so it does not bias the results. 
  • Use multiple human reviewers to apply labels independently and compare where labels differ.
  • Consult with research, academic papers and sector requirements to determine data labels. This could include consultation with members of protected groups or their representatives to define the labelling criteria.
  • Leverage crowdsourcing platforms and crowd annotation techniques to engage a diverse pool of annotators for labelling datasets. This allows for multiple perspectives and quality checks on labelled data, helping to mitigate biases and errors introduced by individual annotators. 

 

Control measure: The statistical accuracy of new AI systems or changes to existing AI systems is tested, documented, and consistently achieves the required accuracy level before the go live decision. The decision making process to go live is documented and includes confirmation that the organisation's required statistical accuracy level is achieved.

Risk: Without a structured testing process, there is a risk that pre-implementation testing will not take place or be completed effectively. If pre-implementation testing does not occur, issues with statistical accuracy may not be picked up in a timely manner and inaccurate or biased system outputs may occur. By not documenting the outcomes of such testing, there is no audit trail. This may breach UK GDPR article 5(1)a.

Ways to meet our expectations:

  • Establish a model evaluation framework to systematically assess the performance of the AI system against predefined accuracy metrics before implementation. This involves designing experiments, conducting tests, and analysing results to quantify the system's predictive performance and identify areas for improvement.
  • Ensure the test plan includes all the relevant checks to provide assurance that there are no errors in data outputs or statistical errors.
  • Document target accuracy rates and tolerances for errors in the test plan.
  • Include minimum success criteria for current performance in the test plan, and monitor false acceptance and rejection rates.
  • Run and test the system in 'ghost mode' or in a testing environment to understand accuracy levels.
  • Retrain the AI system following accuracy testing.
  • Test the AI system on new data set(s) to confirm the same outcome is reached.
  • Conduct ‘decision gate’ reviews as part of the go live process to evaluate the results of statistical accuracy testing and make informed decisions about system readiness.
  • Ask senior management to sign off acceptance of the test results. 

Options to consider:

  • Deploy dedicated validation infrastructures and testing environments to support controlled testing and evaluation of AI systems in a production-like setting. This includes sandbox environments, staging servers, and simulation tools for conducting end-to-end validation tests.
  • Apply cross-validation techniques such as k-fold cross-validation or stratified cross-validation to validate the generalisation performance of the AI model on unseen information. This helps mitigate overfitting and ensures that the model's performance is robust across different data partitions.
  • Produce and publish reports both internally and externally on the statistical accuracy testing results.

 

Control measure: There are processes in place to ensure there is a human review of the statistical accuracy of the AI system.

Risk: Without a structured testing process in place, there is a risk that a human review will not be undertaken or completed effectively to provide an independent assessment of AI system outputs. This may breach UK GDPR article 5(1)(a).

Ways to meet our expectations:

  • Detail the methodology the human reviewer will use when testing the system for statistical accuracy. The test plan should outline the criteria, requirements and sampling method and size. 
  • Check that the rate of error in data outputs or statistical errors is within acceptable and documented tolerances.
  • Retrain the AI system following testing (eg by improving input data, different balance of false positives and negatives, or using different learning algorithms).
  • Test the AI system on new data set(s) to confirm consistency in statistical accuracy rates. 
  • Ask senior management to sign off acceptance of the test results.

Options to consider:

  • Employ active learning and semi-supervised learning techniques to optimise the labelling process and maximise the efficiency of human review efforts. This involves prioritising uncertain or ambiguous data points for human annotation, and focusing human review efforts where they are most needed.
  • Explore automated labelling and data augmentation techniques to complement human labelling efforts and improve dataset quality. This includes leveraging machine learning algorithms for automatic labelling, data synthesis, and data enrichment to enhance the diversity and representativeness of labelled datasets.
  • Produce and publish internal and external reports on the statistical accuracy testing results.

 

Control measure: AI systems are regularly monitored or tested to ensure outputs are fair and within statistical accuracy tolerance rates, and there are no discriminatory outputs or decisions made.

Risk: Without regular monitoring of the system or irregular system testing, there is a risk of undetected model drift, and the resulting system outputs will be unfair. Unfair decisions may deny people social or economic opportunities. This may breach UK GDPR article 5(1)(a).

Ways to meet our expectations:

  • Implement a regular testing regime to monitor the AI system and introduce metrics or thresholds that trigger tests for statistical accuracy on an ongoing basis. Include relevant checks to identify errors in data outputs, tolerances for errors, and document the results.
  • Monitor AI for overfitting (ie when a learning algorithm pays too much attention to the specific features in the training datasets, which can impact a person who isn’t similar to the people in the training datasets). For example, by monitoring precision and recall to identify possible overfitting.
  • Test the AI system using new data set(s) to confirm the same statistical accuracy rates are reached, and change the learning process, if required.
  • Consider retraining the AI systems as necessary (eg by implementing algorithmic fairness measures, fairness constraints, improving input data, using a different balance of false positives and negatives, or using different learning algorithms).
  • Undertake regular compliance checks to provide assurance for AI systems or components managed by third parties.
  • Modify or delete information that reflects past inaccuracy from the system if no longer relevant to the current decision.
  • Carry out tests which include running a traditional decision-making system and an AI system concurrently and investigate any significant difference in the type of decisions.
  • Save AI models separately so models can revert back to a previous version if significant drift occurs.

Options to consider:

  • Consult with external experts and review academic literature to ensure there is not a reliance on one testing mechanism only. Consider different or varying methods of testing, as appropriate.
  • Use the services of independent auditors to conduct frequent algorithmic audits, performance reviews and software assessments to evaluate the accuracy measures in place. 
  • Monitor and report statistical accuracy or AI KPIs in any internal or external accountability reports.
  • Report test results to senior management and key stakeholders.

 

Control measure: Any complaints about inaccurate or discriminatory and biased outputs from AI systems are documented and appropriate timely action is taken.

Risk: Without mechanisms to allow complaints to be recorded, shared and investigated collaboratively between AI stakeholders, there is a risk that AI systems generate inaccurate and uncorrected output. This may breach UK GDPR articles 5(1)a, 12-15, or 22.

Ways to meet our expectations:

  • Log all complaints about your AI and track the issue, the response, and the response date.
  • Analyse the complaints to determine trends, issues, and risks, and share this with senior management.
  • Develop incident response procedures to handle and address any issues or failures identified during post-implementation testing. This includes defining escalation paths, assigning responsibility for resolution, and implementing corrective actions to mitigate the impact of incidents on system performance.
  • Establish a continuous improvement process to iteratively enhance the AI system based on insights gained from complaints in accuracy.
  • Educate staff on how to respond to people challenging AI decisions or outputs. If the challenge is upheld, review and amend practices, where necessary.

Options to consider:

Discuss complaint types, root cause analysis and lessons learned as standing agenda items at appropriate meetings, committees or forums.