Sector scenarios

There are three main approaches to the use of genomic data that are of particular interest to us:

Collection and use of genomic information for research and discovery in healthcare and associated research.
Application and translation of genomic information for non-health related use, such as direct-to-consumer advice for lifestyle and social traits.
Familial and ancestral analysis.

Our research indicates a significant rise in organisations using genomic information Research and medical uses of genomic data are already well advanced, but other sectors are likely to increase their uptake of genomic data in the near term. Some sectors, such as the military, and their uses of genomic data are beyond the scope of this report. We have identified the following sectors where we anticipate that uses of genomic data may have a major impact on UK markets in the next two to seven years:

The medical research sectors will continue to expand upon GWAS and the examination of polygenic diseases and traits.
The health sector may explore the potential of predictive healthcare (P4 medicine), drawing upon polygenic risk scoring to provide lifestyle and dietary advice as well as preventative treatments.
The wellbeing, direct-to-consumer and sports sector may utilise genomic data and polygenic risk scoring to build on the rapidly developing market in familial, ancestry and trait tracking as well as dietary advice and prenatal testing.
The education sector may seek to use polygenic risk scores to identify SEND requirements for students and likely resources for schools.
The insurance sector may consider using polygenic risk scoring to inform insurance offerings across health, life and driving sectors for example.
The law enforcement sector may seek to use genomic, rather than genetic, data to identify suspects. Phenotypes inferred from genomic data may also lead to further routes of identification via facial recognition.

Number of years	Sector
2-3 years	The medical research sectors will continue to expand upon GWAS and the examination of polygenic diseases and traits.
2-3 years	The health sector may explore the potential of predictive healthcare (P4 medicine), drawing upon polygenic risk scoring to provide lifestyle and dietary advice as well as preventative treatments.
4-5 years	The wellbeing, direct-to-consumer and sports sector may utilise genomic data and polygenic risk scoring to build on the rapidly developing market in familial, ancestry and trait tracking as well as dietary advice and prenatal testing. ¹²
5-7 years	The education sector may seek to use polygenic risk scores to identify SEND requirements for students and likely resources for schools.
10 years +	The insurance sector may consider using polygenic risk scoring to inform insurance offerings across health, life and driving sectors for example.
10 years +	The law enforcement sector may seek to use genomic, rather than genetic, data to identify suspects. Phenotypes inferred from genomic data may also lead to further routes of identification via facial recognition.

While the above diagram provides a very brief overview of our findings, it is helpful to explore actual uses of genomic data within these sectors from a data protection perspective, before examining their potential issues.

Please note that these scenarios are intended to explore possible developments and uses of genomic information. While the scenarios include high-level commentary on aspects of relevant data protection compliance issues, this should not be interpreted as confirmation that the relevant processing is either desirable or legally compliant. This document does not provide ICO guidance.

Short-term (2-3 years):

The health sector is likely to use genomic data for the greatest impact in the next two to three years. We are likely to see increased collection and use of genomic information to develop preventative and personalised healthcare.

The UK government may consider a wider drive to gather citizen genomic data, to provide more effective, efficient and timely healthcare. A nationally funded trusted research environment (TRE) could hold the information. This would allow for controlled, closely monitored access to the high-risk information. Such an approach could provide a world-leading resource for research, allowing scientists to identify the causes of heritable traits relating to complex illnesses and conditions. In turn, this could create a lifelong basis for proactive treatment. Genomic research could use approaches that can pseudonymise information. However, it becomes complex if organisations then use the information for personalised healthcare and direct interventions for patients and the wider public.

Data protection concerns

Pseudonymisation will be a particular challenge, given the desire to link special category information within healthcare records with genomic data and polygenic risk scores via algorithmic analysis. ¹³ If this happens using polygenic risk scoring, organisations could share the probabilities of conditions and diseases with healthcare providers to recommend lifestyle changes and preventive treatments. With the proliferation of wearables and wellbeing tech, people could also provide a greater input of data about their lifestyle and environment. This might allow for more refined results and predictive abilities. It could support future research into links between the polygenic liability of a trait or disease and environmental factors. In turn, this, may lead to organisations focusing more on collecting additional types of information to generate further inferences.

This would change the purpose of processing people’s genomic data from a specific to possibly more generalised, and speculative, purpose. The data controller would need to consider whether this purpose was compatible with the original purpose. They would also need to have a lawful basis for this new processing operation. If they originally collected the information using consent, they would likely need to collect fresh consent unless they informed people beforehand of this potential use. This approach could also raise concerns about fairness, given that people may not have expected the organisation to use their genomic data in this way.

Transparency will be essential for people to understand how organisations are using their information. This scenario may present highly complex data flows between public and private organisations (e.g. from a research environment to a healthcare provider to people receiving the advice and recommendations). Given the potential impacts and high-risk nature of the collected information, we would also expect organisations to have appropriate security measures in place as set out under Article 5 (1)(f).

Organisations would need to mitigate against systemic discrimination emerging through either combined records or through using potentially repurposed AI models previously used for something other than analysing genomic information. We discuss these risks and how data protection law would potentially apply later in this report.

An extension into the use of wearable technologies to gather additional lifestyle information, such as daily activity, may create further challenges. It may not be clear for people and organisations what constitutes health or wellbeing information. There may also be a fundamental challenge to the fairness of access to treatment, with those willing or able to pay for the additional devices accessing more granular and accurate outcomes.

Data controllers will also face a challenge in upholding appropriate data rights. If someone submits a subject access request for their information from a healthcare provider, or another organisation that uses genomic data, then organisations will need to carefully consider what information they can provide. As genomic data can reveal highly sensitive or intimate insights, providers will need to consider whether or how to appropriately limit or redact information sent to third parties such as family members.

Medium-term (3-5 years):

The education sector may consider using genomic data to enhance SEND support in schools. ¹⁴

The government may initiate a private-public partnership to combine publicly-held genomic data with the generation of polygenic risk scoring. This would build on early projects to gather newborn genomic data for long-term preventive healthcare purposes. ¹⁵ This partnership could generate an assessment of funding for SEND requirements and screening of traits and disorders. There may be a particular focus on traits related to ADHD and dyslexia. In turn, this could be combined with healthcare records to develop further inferences and increase the accuracy of probabilistic scores. Private organisations could act on behalf of the educational trusts to identify likely needs for students, providing additional information to both schools and teachers to support targeted, effective help.

Developing this hypothetical approach, the partner organisation could suggest that parents or schools could pay for additional polygenic risk scores as a direct-to-consumer option. The organisation could provide the information via a third-party app. This would allow parents to research traits around educational attainment ¹⁶ as well as sporting and musical ability. The organisation would present the results as probabilities. However, there may be little supporting material to show how they achieved those results and any limitations, such as the impact of the environment or appropriate actions if the user has particularly high or low risk results. As students age, this record could be combined with healthcare records and form part of a permanent, lifelong citizen record.

Data protection concerns

There is a risk of a lack of transparency as information moves between the public and private sector, inhibiting people’s ability to enact their data rights and ensure fair treatment. Furthermore, organisations may have to shift their purpose of processing from funding assessment to trait analysis, creating challenges for purpose limitation and lawfulness of processing.

Depending on the level of human involvement, there may be circumstances where decision-making based on polygenic risk scores amounts to automated individual decision-making with legal or similarly significant effect, within the meaning of Article 22. Controllers would need to make sure that they had met the UK GDPR conditions for carrying out such processing, as well as meeting enhanced transparency requirements. ¹⁷

Controllers would also need to put safeguards in place to protect people’s rights in such scenarios. This includes allowing people to challenge solely automated decisions and to obtain human intervention in the decision-making.

Accuracy is also an area of significant risk given that educational achievement is a highly polygenic trait, defined by thousands of genetic variants and the environment. Any scores assigned would be entirely probabilistic. ¹⁸ The links between inherited intelligence (itself a highly contested notion) and environment require significant further research. We do not yet fully understand or can address the highly-complex and person-specific relationship between genomics and environment. A complex example could be a person with a low genetic risk for educational attainment but who lives in a high-risk environment. Rare variants, unusual circumstances and unrecognised needs may lead to unfair treatment of pupils and systemic discrimination, as results are ‘ported’ rather than targeted. This may, in turn, lead to organisations using ever wider sets of information to try and address the challenge, raising issues of data minimisation and transparency.

Processing children’s information will also require particular care given the sensitivity of the information. If organisations use consent for processing, they will face challenges. Ages of consent will shift as pupils grow up and become responsible for their own information. Organisations will also need to think about what it means to fully inform users when they are providing consent.

Furthermore, if power imbalances exist between an organisation and a person, as is likely in the context of a school, then it’s unlikely that consent will be appropriate. Organisations may also have issues with data retention and risks of inaccuracy that may stem from limited data sets. Finally, as with all instances of genomic data, third-party data presents a risk of revealing inferences about family members’ health and characteristics.

The insurance sector may also seek to build on genomic data in this period, if organisations can use polygenic risk scores to create more accurate estimates. ¹⁹ Possible areas for rapid uptake of polygenic risk scoring include health and life insurance. Drawing directly upon probabilistic risks of inheritable diseases and conditions, from cancer to Parkinson’s disease, insurance providers may offer increasingly personalised insurance plans. These could offer highly-targeted services at affordable rates through direct-to-consumer services.

However, as providers seek to offer increasingly holistic lifestyle analyses, they could extend this approach to cover traits such as risk-taking behaviours, including:

certain physical activities (sports deemed to be high risk);
drinking;
smoking; or
sexual behaviour. ²⁰

Much of this information is likely to be special category data as defined under Article 9(2) and high risk in its uses and impact.

Some may claim that such an approach offers reasonably accurate and cost-effective products. However, there is also a significant risk that the information becomes increasingly biased. This may be against those with specific perceived genomic traits, leading to systemic discrimination. It may also lead to aggressive pricing against those who are unable or unwilling to provide their information. Ultimately, insurance providers may decide to refuse insurance to those deemed too high a risk, leading to a fundamentally unfair use of personal information.

A significant challenge in this sector would be the need for data minimisation and finding an appropriate purpose. Providers may also struggle to ensure transparency and fairness in the use of information. Insurers might, for example, seek to use explicit consent if they are using special category information. However, this could face challenges when the only other products available to customers are significantly more expensive.

The threshold of accuracy for the use of polygenic risk scoring would be of particular interest in this scenario. Insurance providers might face significant challenges over the information they use to derive risks and, in particular, over the gap between derived polygenic risks and the customers’ environments. ²¹ Fundamentally, providers would need to establish a clear sense of an acceptable level of probability for fair and accurate information use which would outweigh potential risks.

Medium to long-term (5-10 years)

The law enforcement sector may make more use of genomic data for crime detection and sentencing, using an increasingly broad array of information and inferences to identify potential suspects.

A private sector company could offer genome wide association studies (GWAS) and analysis of crime scene samples to support murder investigations and cold cases by drawing upon sequences derived by WES (whole exome sequencing) and WGS. ²² The company could use the sample’s analysis to compare DNA profiles across police databases and predict a suspect’s facial features, age (via telomere length) and gender via phenotype information. Combined with AI processing, they could generate suspect e-fits. The company may suggest that, in turn, the police could compare this against social media records to identify a potential suspect. They could also run through facial recognition systems at events and public spaces.

The company may also seek to conduct research on the genomic information they have. They could explore a possible future add-on service that includes providing information on specific disease-related, psychological or character-related traits from the sample. They could do this to create multi-modal behavioural analyses (such as biometrics or behavioural analysis) of suspects to combine with AI-driven facial recognition tracking.

Data protection concerns

This approach would likely raise significant challenges around data retention and purpose limitation. A specific challenge could be if an organisation was to attempt to use direct-to-consumer databases to identify people of interest, as has already been attempted in the US. ²³ It could also raise issues around lawfulness, as any use of personal information for law enforcement purposes must be necessary. This does not mean that the company’s use of the information must be essential, but it must be a targeted and proportionate way of achieving the purpose. A lawful basis will not apply if an organisation can reasonably achieve the purpose by some other, less intrusive means. The police might argue that such an approach was necessary in genuinely exceptional cases, which they could not solve by other means. However, they would need to demonstrate that their use of genomic information was proportionate.

Using genomic information in this way is also likely to be fundamentally unfair and beyond the organisation’s original purpose. It is also likely to face challenges on the potential for inappropriate bias and discrimination and the significant chilling effect it may have for society. The example involves several processing operations which we would consider to be high risk. The organisation would need to carry out a data protection impact assessment and consider whether and how they could mitigate the data protection risks.

¹² Although this is currently illegal in the UK under HFEA regulations, this report will consider the implications of information generated by such an approach.

¹³ Chapter 3 anonymisation guidance

¹⁴ This is an area which organisations such as NCOB are intending to examine further in 2024/25 in regards to scientific and ethical practice.

¹⁵ Newborn Genomes Programme

¹⁶ Genomics Beyond Health

¹⁷ Article 13(2)(f) and 14(2)(g) require controllers who are carrying out automated processing, including profiling, to provide “meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject” as part of their transparency requirements.

¹⁸ Essentially, polygenic risk scores are not certain in the information they provide and do not explain all of a trait’s heritability, which for educational attainment stands at about 40%. There is scope for the explanatory power of the EA PGI to explain 40% of differences between pupils if all the heritability can be recovered via GWAS. This requires including rare variants (and maybe gene-gene, gene-environment interplay). Stakeholders have noted that research is currently some way from getting to that point but is heading in the direction where this may be achieved.

¹⁹ Code on Genetic Testing and Insurance

²⁰ Although predictions around this trait are highly contested in the research community. See Genetic Influences on Adolescent Sexual Behavior: Why Genes Matter for Environmentally-Oriented Researchers and The role of sex in the genomics of human complex traits for some further information.

²¹ Data subjects may also be entitled to request information about how any automated decisions have been taken, and contest those decisions, in line with the safeguards set out under Article 22.

²² Supercharged crime-scene DNA analysis sparks privacy concerns

²³ Cops Used DNA to Predict a Suspect’s Face—and Tried to Run Facial Recognition on It