Introduction
In recent decades, advances in genomics (the study of the genome) have greatly increased understanding of the entire DNA sequence, as well as the interactions between genes. The genome includes genes, which only make up 1-5% of a human’s genome, and all of a person’s other DNA1. The field involves a wide range of different technologies and techniques that have continued to evolve rapidly since scientists first sequenced2 a human genome in 2003.
These technologies pose a range of data protection issues. In addition to genomic sequencing, one technique of particular interest in this chapter is polygenic risk scoring. This looks at the potential impact of many genomic markers to estimate “an individual’s genetic risk for some trait” or disease.3
In healthcare, advances in genomics are moving us towards a possible future of predictive and personalised treatments for complex diseases. These are based on a person’s genome, and clinical use of disease risk scores drawing on genomic insights and other information.
Outside of healthcare, future applications of more sophisticated genomic insights remain speculative. However, they could extend beyond existing applications for current, more limited tests (such as many popular direct-to-consumer genetic tests which only analyse specific sections of DNA)4. It is still highly uncertain what depth and quality of insight future analysis of genomic information may reveal. Solutions that consider health and behaviour are however already under development. This brings to the surface growing concerns about issues around accuracy and fairness, as well as the privacy impacts for newborns and relatives such insights may present.
About genomics
As costs fall and sequencing technologies continue to improve, it is now possible to sequence and analyse the whole human genome of large numbers of people. Larger and higher quality datasets are available for analysis using increasingly advanced machine learning and deep learning techniques. In future, genomic insights may also be combined with further medical and environmental information to provide a more complete picture of a person.
As a result, wider understanding of how a person’s genome can influence polygenic traits (traits that are influenced by multiple genes) is improving fast. Polygenic traits include physical characteristics such as height or risk of diseases such as cancer, but also behavioural characteristics and skills. For example, recent studies have aimed to quantify the extent to which certain behavioural traits are influenced by the genome. This includes susceptibility to certain substance abuse disorders (42-79%), conscientiousness (44%) and extraversion (53%).5 In 2022, Nature published the largest study ever conducted on genetic associations with “educational attainment” using a direct to consumer (DTC) genetic dataset of three million records.6 These studies are drawing complex, uncertain and contested inferences through the use of highly sensitive information. Making inferences about people from genomic analysis heightens the risk of inappropriate bias and discrimination.7
State of development
Most genomics research, investment, and market growth focuses on further developing underlying technologies and healthcare uses (primarily in diagnostics, drug development and precision medicine). Genomics in medicine is “already saving lives”,8 for example by enabling diagnosis and treatment of rare conditions.
Some predict AI-powered genomic drug development and personalised medicine will play an increasingly important role in healthcare in five to 10 years.9 The UK has several major genomics projects and is aiming to become the first nation to “offer whole genome sequencing as part of routine care”.10 The US and China are among other countries with significant genomic industries.
Outside of healthcare, direct-to-consumer (DTC) genetic testing is a well-established market. Providers offer tests for purposes such as:
- tracing ancestry;
- “polygenic scoring” for insights into wellness (eg, genetic markers of how much alcohol you have consumed);
- predisposition to disease; or
- skills such as recognising musical pitch.
Some third-party services also offer personalised health or fitness recommendations based on DTC tests.
Currently, investment in non-healthcare applications of genomics is still very limited.11 However, as insights expand and technological underpinnings continue to improve, we may see novel use cases for health or behavioural polygenic risk scores in non-healthcare applications emerge in future. The Government Office for Science has flagged potential applications, such as interventions in early-childhood education, sports nutrition, and health or personality screening in employment. Such use cases are not yet scientifically validated and should be treated with caution. Timescales for potential developments also remain unclear. There are however examples of novel patents and tests, such as an overseas law enforcement trial using genomics to predict the physical features and gender of suspects.12
There could also be future uses for genomic insights in insurance to determine the risk of disease. In the UK, a voluntary agreement between insurers and government limits the use of genetic information to very specific circumstances. However, these limitations can be revised in response to market changes and technological developments,13 so it is possible insurers may seek more access to genomic data in future.
Fictional future scenario
Nadia, 10, is a gifted, prospective professional athlete. Her genome was sequenced at birth to inform her future healthcare. Her parents share this information with a firm offering polygenic risk scoring for elite training and nutrition management. This then provides a tailored plan linked to a third-party app. The firm notes that the scores are an estimate, but don’t provide much information about its scoring model. Nadia’s parents share the plan and scores with her coach. They don’t know that diversity gaps in the training information means the model significantly overestimates the significance of certain traits. This affects the validity of Nadia’s training plan.
Under the contract, Nadia’s parents also agree to the use of her anonymised genomic information for research purposes. Five years later, the firm sends Nadia’s parents new insights. The scores predict Nadia is a risk taker, with an elevated future risk of depression, sleep disorders and several other traits. This worries them all. Could these new scores affect her chances of being picked for an elite training programme, if they ask for health information? How could the firm know so much more about her, if her personal information was anonymised?
Nadia’s older brother Omar also isn’t pleased the firm has shared Nadia’s genomic information in this way – this information also reveals a lot about him. He wants the firm to delete the information from everywhere other than Nadia’s medical record. Can he do this?
Data protection and privacy implications
- Accuracy and fairness: for some conditions, current genomic sequencing techniques can establish a clear genetic link. However, the predictive power of polygenic scoring for many other medical conditions remains limited. For example, this is due to the sheer number and complexity of markers that may be associated with a trait, and limited diversity in datasets.14
Given the current limitations of the science, the predictive capabilities of polygenic risk scoring for behavioural traits are even more contested.15 It is highly uncertain whether significantly more accurate scores for particular traits will ever become available. As we have seen in our work on biometrics and neurotech, there is a risk that we may see inaccurate (or low accuracy) applications used in future, particularly outside of healthcare. Like outputs of many other AI systems, a polygenic risk score is “intended to represent a statistically informed guess”.16 As models improve, there is a risk that organisations (or people) over rely on their predictive power due to a lack of understanding or a failure to make limitations and biases clear. There is also a risk that new use cases could exacerbate power imbalances or even lead to genetic discrimination.17
Should an organisation seek to process genomic information, or use polygenic risk scores, they must ensure the intended processing is fair. That means people’s personal information is being used in ways they would reasonably expect and will not have unjustified adverse impacts on them. Organisations must also ensure that any scoring is sufficiently accurate for the purpose and they explain any limitations.
- Anonymisation and security: Genomic data is special category information. It is difficult to effectively anonymise, given the uniqueness of the information.18 In addition, genomes may need to be stored for a long time for research purposes to obtain insights. Similar to biometric information, our genomes stay with us for life, which means that the impacts of a data breach can be serious. It is hard to predict what future insights the genome may identify. Given the sensitivity of the information involved, strong security measures and controls over the use of the personal information are particularly important.
- Third party privacy: The disclosure and analysis of genomic information (including genomes of deceased people) also affects related family members. If uses expand significantly beyond medical settings, it could become even harder for family members to maintain control over their information. This is a complex issue, particularly when considering questions such as lawful bases for processing, and processing subject access requests for genomic information that could identify others.19
Recommendations and next steps
- We will continue to engage with and monitor this complex and rapidly evolving space. As a first step, we are developing a further in-depth tech futures report on genomics, to be published in 2024.
- We will also explore the regulatory landscape to build our knowledge and identify areas of critical intersection and collaboration.
- We will continue to remain alert to examples of misuse of genetic information, or genetic discrimination arising from current and novel uses of genomic information. Should use cases evolve further, particularly beyond medical settings, we will monitor the risk of misinterpretation or over-reliance on the predictive capabilities of polygenic risk scores.
- Employers considering genetic testing should refer to our updated employment guidance What if we use genetic testing? and insurers should refer to the Code on genetic testing and insurance.
Further reading
- Government Office for Science report on “Genomics Beyond Health” (2022)
- UK Government policy paper on “Genome UK: the future of healthcare” (2020)
- UK Government policy paper on “Genome UK: 2022 to 2025 implementation plan for England”
- Ada Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
- Centre for Educational Neuroscience blog on “can polygenic scores predict educational outcomes?” (2023)
- Research paper on Predicting Physical Appearance from DNA Data—Towards Genomic Solutions (2022)
- ICO guidance on special category data
- UNESCO Universal Declaration on the Human Genome and Human Rights
1 Genomics England webpage on Understanding Genomics Genomics is distinct from genetics, which only looks at genes.
2 Before DNA can be analysed, it needs to be sequenced – converted into the basic building blocks of DNA, represented by a series of four letters that can be read by a computer.
3 Nuffield Department of Population Health article from the Frontiers journal on “Calculating Polygenic Risk Scores (PRS) in UK Biobank: A practical guide for epidemiologists” (2022)
4 Government Office for Science report on “Genomics Beyond Health” (2022)
5 Government Office for Science report on “Genomics Beyond Health” (2022)
6 Nature Genetics article on “Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals” (2022)
7 See, eg Centre for Educational Neuroscience blog on “can polygenic scores predict educational outcomes?” (2023); Ada 8 Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
8 Gartner article on the Hype Cycle for Life Science Discovery Research, 2023
9 Gartner article on the Healthcare and Life Science CIO’s Genomics Series: Part 1 – Understanding the Business Value of Omics Data; Ada Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
10 NHS England webpage about the NHS Genomic Medicine Service
11 Ada Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
12 Australian Federal Police (AFP) media release on advanced technology which allows APF to predict criminal profiles from DNA (2021)
13 UK Government corporate report on the Code on Genetic Testing and Insurance: 3-year review 2022
14 Genome Medicine article on ”Polygenic risk scores: from research tools to clinical instruments" (2020)
15 Government Office for Science report on “Genomics Beyond Health” (2022); Centre for Educational Neuroscience blog on “can polygenic scores predict educational outcomes?” (2023); Ada Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
16 ICO guidance on AI and data protection: What do we need to know about accuracy and statistical accuracy?
17 To date, no evidence of such discrimination has been reported in the UK. There have been limited cases overseas, for example where individuals have been denied insurance: European Journal of Human Genetics article on ”Genetic discrimination still casts a large shadow in 2022“ (2022)
18 Ada Lovelace Institute report on the use of AI in Genomics “DNA.I.” (2023)
19 PHG Foundation report on The GDPR and Genomic Data (2020)