The ICO exists to empower you through information.

At a glance

  • Deriving the rationale explanation is key to understanding your AI system and helps you comply with parts of the GDPR. It requires looking ‘under the hood’ and helps you gather information you need for some of the other explanations, such as safety and performance and fairness. However, this is a complex task that requires you to know when to use more and less interpretable models and how to understand their outputs.
  • To choose the right AI model for your explanation needs, you should think about the domain you are working in, and the potential impact of the deployment of your system on individuals and society.
  • Following this, you should consider whether:
    • there are costs and benefits to replacing your current system with a newer and potentially less explainable AI model;
    • the data you use requires a more or less explainable system;
    • your use case and domain context encourage choosing an inherently interpretable system;
    • your processing needs lead you to select a ‘black box’ model; and
    • the supplementary interpretability tools that help you to explain a ‘black box’ model (if chosen) are appropriate in your context.
  • To extract explanations from inherently interpretable models, look at the logic of the model’s mapping function by exploring it and its results directly.
  • To extract explanations from ‘black box’ systems, there are many techniques you can use. Make sure that they provide a reliable and accurate representation of the system’s behaviour.


Selecting an appropriately explainable model:

☐ We know what the interpretability/transparency expectations and requirements are in our sector or domain.

☐ In choosing our AI model, we have taken into account the specific type of application and the impact of the model on decision recipients.

☐ We have considered the costs and benefits of replacing the existing technology we use with an AI system.

☐ Where we are using social or demographic data, we have considered the need to choose a more interpretable model.

☐ Where we are using biophysical data, for example in a healthcare setting, we have weighed the benefits and risks of using opaque or less interpretable models.

☐ Where we are using a ‘black box’ system, we have considered the risks and potential impacts of using it.

☐ Where we are using a ‘black box’ system we have also determined that the case we will use it for and our organisational capacity both support the responsible design and implementation of these systems.

☐ Where we are using a ‘black box’ system we have considered which supplementary interpretability tools are appropriate for our use case.

☐ Where we are using ‘challenger’ models alongside more interpretable models, we have established that we are using them lawfully and responsibly, and we have justified why we are using them.

☐ We have considered how to measure the performance of the model and how best to communicate those measures to implementers, and decision recipients.

☐ We have mitigated any bias we have found in the model and documented these mitigation processes.

☐ We have made it clear how the model has been tested, including which parts of the data have been used to train the model, which have been used to test it, and which have formed the holdout data.

☐ We have a record of each time the model is updated, how each version has changed, and how this affects the model’s outputs.

☐ It is clear who within our organisation is responsible for validating the explainability of our AI system.

Tools for extracting an explanation:

All the explanation extraction tools we use:

☐ Convey the model’s results reliably and clearly.

☐ Help implementers of AI-assisted decisions to exercise better-informed judgements.

☐ Offer affected individuals plausible, accurate, and easily understandable accounts of the logic behind the model’s output.

For interpretable AI models:

☐ We are confident in our ability to extract easily understandable explanations from models such as regression-based and decision/rule-based systems, Naïve Bayes, and K nearest neighbour.

For supplementary explanation tools to interpret ‘black box’ AI models:

☐ We are confident that they are suitable for our application.

☐ We recognise that they will not give us a full picture of the opaque model and have made sure to clearly convey this limitation to implementers and decision recipients.

☐ In selecting the supplementary tool, we have prioritised the need for it to provide a reliable, accurate and close approximation of the logic behind our AI system’s behaviour, for both local and global explanations.

Combining supplementary explanation tools to produce meaningful information about your AI system’s results:

☐ We have included a visualisation of how the model works.

☐ We have included an explanation of variable importance and interaction effects, both global and local.

☐ We have included counterfactual tools to explore alternative possibilities and actionable recourse for individual cases.

In more detail


The rationale explanation is key to understanding your AI system and helps you comply with parts of the GDPR. It requires detailed consideration because it is about how the AI system works, and can help you obtain an explanation for the underlying logic of the AI model you decide to use.

The selection of an appropriate model can also help you provide information about the safety and performance and, fairness explanations. Therefore, it is important that you carefully consider how you select your model. We would also recommend that you document your decision making process, so you can evidence that you have considered the impact your model selection will have on the decision recipient.

Select an appropriately explainable model

Selecting an appropriate model is important whether you are procuring a model from an external vendor, or looking to build a bespoke system in-house. In both cases, you need to consider the following factors to ensure you select the most appropriate model for your needs. Where you are procuring a system, you may wish to ask the vendor about some or all of these elements, as you will need to understand the system in order to provide an appropriate explanation.

Where do we start?

Before you consider the technical factors, you should consider:

Domain: Consider the specific standards, conventions, and requirements of the domain your AI system will be applied into.

For example, in the financial services sector, rigorous justification standards for credit and loan decisions largely dictate the need to use fully transparent and easily understandable AI decision-support systems. Likewise, in the medical sector, rigorous safety standards largely dictate the extensive levels of performance testing, validation and assurance that are demanded of treatments and decision-support tools. Such domain specific factors should actively inform the choices you make about model complexity and interpretability.

Impact: Think about the type of application you are building and its potential impacts on affected individuals.

For example, there is a big difference between a computer vision system that sorts handwritten employee feedback forms and one that sorts safety risks at a security checkpoint. Likewise, there is also a difference between a complex random forest model that triages applicants at a licensing agency and one that triages sick patients in an accident and emergency department.

Higher-stakes or safety-critical applications will require you to be more thorough in how you consider whether prospective models can appropriately ensure outcomes that are non-discriminatory, safe, and supportive of individual and societal wellbeing.

Low-stakes AI models that are not safety-critical, do not directly impact the lives of people, and do not process potentially sensitive social and demographic data are likely to mean there is less need for you to dedicate extensive resources to developing an optimally performing but highly interpretable system.

Draw on the appropriate domain knowledge, policy expertise and managerial vision in your organisation. You need to consider these when your team is looking for the best-performing model.

What are the technical factors that we should consider when selecting a model?

You should also discuss a set of more technical considerations with your team or vendor before you select a model.

Existing technologies: consider the costs and benefits of replacing current data analysis systems with newer systems that are possibly more resource-intensive and less explainable.

One of the purposes of using an AI system might be to replace an existing algorithmic technology that may not offer the same performance level as the more advanced machine learning techniques that you are planning to deploy.

In this case, you may want to carry out an assessment of the performance and interpretability levels of your existing technology. This will provide you with a baseline against which you can compare the trade-offs of using a more advanced AI system. This could also help you weigh the costs and benefits of building or using a more complex system that requires more support for it to be interpretable, in comparison to using a simpler model.

It might also be helpful to look into which AI systems are being used in your application area and domain. This should help you to understand the resource demands that building a complex but appropriately interpretable system will place on your organisation.

Further reading

For more information on the trade-offs involved in using AI systems, see the ICO’s AI Auditability Framework blogpost on trade-offs.


Data: integrate a comprehensive understanding of what kinds of data you are processing into considerations about the viability of algorithmic techniques.

To select an appropriately explainable model, you need to consider what kind of data you are processing and what you are processing it for.

It may be helpful to group the kinds of data that you may use in your AI system into two categories:

i. Data that refers to demographic characteristics, measurements of human behaviour, social and cultural characteristics of people.

ii. Biological or physical data, such as biomedical data used for research and diagnostics (ie data that does not refer to demographic characteristics or measurements of human behaviour).

With these in mind, there are certain things to consider:

  • In cases where you are processing social or demographic data (group i. above) you may come across issues of bias and discrimination. Here, you should prioritise selecting an optimally interpretable model, and avoid ‘black box’ systems.
  • More complex systems may be appropriate in cases where you are processing biological or physical data (group ii. above), only for the purposes of gaining scientific insight (eg predicting protein structures in genomics research), or operational functionality (eg computer vision for vehicle navigation). However, where the application is high impact or safety-critical, you should weigh the safety and performance (accuracy, security, reliability and robustness) of the AI system heavily in selecting the model. Note, though, that bias and discrimination issues may arise in processing biological and physical data, for example in the representativeness of the datasets these models are trained and tested on.
  • In cases where you are processing both these groups of data and the processing directly affects individuals, you should consider concerns about both bias and safety and performance when you are selecting your model.

Another distinction you should consider is between:

  • conventional data (eg a person’s payment history or length of employment at a given job); and
  • unconventional data (eg sensor data – whether raw or interlinked with other data to generate inferences – collected from a mobile phone’s gyroscope, accelerometer, battery monitor, or geolocation device or text data collected from social media activity).

In cases where you are using unconventional data to support decisions that affect individuals, you should bear the following in mind:

  • you can consider this data to be of the same type as group i. data above, and treat it the same way (as it gives rise to the same issues);
  • you should select transparent and explainable AI systems that yield interpretable results, rather than black box models; and
  • you can justify its use by indicating what attribute the unconventional data represents in its metadata, and how such an attribute might be a factor in evidence-based reasoning or generate inferences that meet reasonable expectations.

For example, if GPS location data is included in a system that analyses credit risk, the metadata must indicate what interpretively significant feature such data is supposed to indicate about the individual whose data is being processed.

Interpretable algorithms: when possible and application-appropriate, draw on standard and algorithmic techniques that are as interpretable as possible.

In high impact, safety-critical or other potentially sensitive environments, you are likely to need an AI system that maximises accountability and transparency. In some cases, this will mean you prioritise choosing standard but sophisticated non-opaque techniques.

These techniques (some of which are outlined in the table in Annexe 2) may include decision trees/rule lists, linear regression and its extensions like generalised additive models, case-based reasoning, or logistic regression. In many cases, reaching for the ‘black box’ model first may not be appropriate and may even lead to inefficiencies in project development. This is because more interpretable models are also available, which perform very well but do not require supplemental tools and techniques for facilitating interpretable outcomes.

Careful data pre-processing and iterative model development can hone the accuracy of interpretable systems. As a result, the advantages gained by the combination of their improved performance and their transparency may outweigh those of less transparent approaches.

‘Black box’ AI systems: when you consider using opaque algorithmic techniques, make sure that the supplementary interpretability tools that you will use to explain the model are appropriate to meet the domain-specific risks and explanatory needs that may arise from deploying it.

For certain data processing activities it may not be feasible to use straightforwardly interpretable AI systems. For example, the most effective machine learning approaches are likely to be opaque when you are using AI applications to classify images, recognise speech, or detect anomalies in video footage. The feature spaces of these kinds of AI systems grow exponentially to hundreds of thousands or even millions of dimensions. At this scale of complexity, conventional methods of interpretation no longer apply.

For clarity, we define a ‘black box’ model as any AI system whose inner workings and rationale are opaque or inaccessible to human understanding. These systems may include:

  • neural networks (including recurrent and convolutional neural nets);
  • ensemble methods (an algorithmic technique such as the random forest method that strengthens an overall prediction by combining and aggregating the results of several or many different base models); and
  • support vector machines (a classifier that uses a special type of mapping function to build a divider between two sets of features in a high dimensional space).

The main kinds of opaque models are described in more detail in Annexe 2.

You should only use ‘black box’ models if you have thoroughly considered their potential impacts and risks in advance. The members of your team should also have determined that your use case and your organisational capacities/ resources support the responsible design and implementation of these systems.

Likewise, you should only use them if supplemental interpretability tools provide your system with a domain-appropriate level of explainability. This needs to be reasonably sufficient to mitigate the potential risks of the system and provide decision recipients with meaningful information about the rationale of any given outcome. A range of the supplementary techniques and tools that assist in providing some access to the underlying logic of ‘black box’ models is explored below and in Annexe 3.

As part of the process-based aspect of the rationale explanation, you should document and keep a record of any deliberations that cover how you selected a ‘black box’ model.

Hybrid methods – use of ‘challenger’ models: when you select an interpretable model to ensure explainable data processing, you should only carry out parallel use of opaque ‘challenger’ models for purposes of feature engineering/selection, insight, or comparison if you do so in a transparent, responsible, and lawful manner.

Our research has shown that some organisations in highly regulated areas like banking and insurance are increasingly using more opaque ‘challenger’ models for the purposes of feature engineering/ selection, comparison, and insight. However they are continuing to select interpretable models in their customer-facing AI decision-support applications.

‘Black box’ challenger models are trained on the same data that trains transparent production models and are used both to benchmark the latter, and in feature engineering and selection.

When challenger models are employed to craft the feature space, ie to reduce the number of variables (feature selection) or to transform/ combine/ bucket variables (feature engineering), they can potentially reduce dimensionality and show additional relationships between features. They can therefore increase the interpretability of the production model.

If you use challenger models for this purpose, you should make the process explicit and document it. Moreover, any highly engineered features that are drawn from challenger models and used in production models must be properly justified and annotated in the metadata to indicate what attribute the combined feature represents and how such an attribute might be a factor in evidence-based reasoning.

When you use challenger models to process the data of affected decision recipients – even for benchmarking purposes – you should properly record and document them. You should treat them as core production models, document them, and hold them to the same explainability standards, if you incorporate the insights from this challenger model’s processing into any dimension of actual decision-making. For example, the comparative benchmarking results are shared with implementers/ users, who are making decisions.

What types of models are we choosing between?

To help you get a better picture of the spectrum of algorithmic techniques, Annexe 2 lays out some of the basic properties, potential uses, and interpretability characteristics of the most widely used algorithms at present. These techniques are also listed in the table below.

The 11 techniques listed in the left column are considered to be largely interpretable, although for some of them, like the regression-based and tree-based algorithms, this depends on the number of input features that are being processed. The four techniques in the right column are more or less considered to be ‘black box’ algorithms.

Broadly interpretable systems

Broadly “black box” systems

Linear regression (LR)  Ensemble methods
Logistic regression  Random Forest 
Generalised linear model (GLM) Support vector machines (SVM)
Generalised additive model (GAM) Artificial neural net (ANN)
Regularised regression (LASSO and Ridge)  
Rule/decision lists and sets  
Decision tree (DT)  
Supersparse linear integer model (SLIM)  
K-nearest neighbour (KNN)  
Naïve Bayes  
Case-based reasoning (CBR)/ Prototype and criticism  

Tools for extracting explanations

Extracting and delivering meaningful explanations about the underlying logic of your AI model’s results involves both technical and non-technical components.

At the technical level, to be able to offer an explanation of how your model reached its results, you need to:

  • become familiar with how AI explanations are extracted from intrinsically interpretable models;
  • get to know the supplementary explanation tools that may be used to shed light on the logic behind the results and behaviours of ‘black box’ systems; and
  • learn how to integrate these different supplementary techniques in a way that will enable you to provide meaningful information about your system to its users and decision recipients.

At the non-technical level, extracting and delivering meaningful explanations involves establishing how to convey your model’s results reliably, clearly, and in a way that enables users and implementers to:

  • exercise better-informed judgements; and
  • offer plausible and easily understandable accounts of the logic behind its output to affected individuals and concerned parties.

Technical dimensions of AI interpretability

Before going into detail about how to set up a strategy for explaining your AI model, you need to be aware of a couple of commonly used distinctions that will help you and your team to think about what is possible and desirable for an AI explanation.

  • Local vs global explanation

The distinction between the explanation of single instances of a model’s results and an explanation of how it works across all of its outputs is often characterised as the difference between local explanation and global explanation. Both types of explanation offer potentially helpful support for providing significant information about the rationale behind an AI system’s output.

A local explanation aims to interpret individual predictions or classifications. This may involve identifying the specific input variables or regions in the input space that had the most influence in generating a particular prediction or classification.

Providing a global explanation entails offering a wide-angled view that captures the inner-workings and logic of that model’s behaviour as a whole and across predictions or classifications. This kind of explanation can capture the overall significance of features and variable interactions for model outputs and significant changes in the relationship of predictor and response variables across instances. It can also provide insights into dataset-level and population-level patterns, which are crucial for both big picture and case-focused decision-making.

  • Internal/ model intrinsic vs. external/ post-hoc explanation

Providing an internal or model intrinsic explanation of an AI model involves making intelligible the way its components and relationships function. It is therefore closely related to, and overlaps to some degree with, global explanation - but it is not the same. An internal explanation makes insights available about the parts and operations of an AI system from the inside. These insights can help your team understand why the trained model does what it does, and how to improve it.

Similarly, when this type of internal explanation is applied to a ‘black box model’, it can shed light on that opaque model’s operation by breaking it down into more understandable, analysable, and digestible parts. For example, in the case of an artificial neural network (ANN), it can break it down into interpretable characteristics of its vectors, features, interactions, layers, parameters etc. This is often referred to as ‘peeking into the black box’.

Whereas you can draw internal explanations from both interpretable and opaque AI systems, external or post-hoc explanations are more applicable to ‘black box’ systems where it is not possible to fully access the internal underlying rationale due to the model’s complexity and high dimensionality.

Post-hoc explanations attempt to capture essential attributes of the observable behaviour of a ‘black box’ system by subjecting it to a number of different techniques that reverse-engineer explanatory insights. Post-hoc approaches can do a number of different things:

  • test the sensitivity of the outputs of an opaque model to perturbations in its inputs;
  • allow for the interactive probing of its behavioural characteristics; or
  • build proxy-based models that utilise simplified interpretable techniques to gain a better understanding of particular instances of its predictions and classifications, or of system behaviour as a whole.

​Getting familiar with AI explanations through interpretable models

For AI models that are basically interpretable (such as regression-based and decision/rule-based systems, Naïve Bayes, and K nearest neighbour), the technical aspect of extracting a meaningful explanation is relatively straightforward. It draws on the intrinsic logic of the model’s mapping function by looking directly at it and at its results.

For instance, in decision trees or decision/ rule lists, the logic behind an output will depend on the interpretable relationships of weighted conditional (if-then) statements. In other words, each node or component of these kinds of models is, in fact, operating as a reason. Extracting a meaningful explanation from them therefore factors down to following the path of connections between these reasons.

Note, though, that if a decision tree is excessively deep or a given decision list is overly long, it will be challenging to interpret the logic behind their outputs. Human-scale reasoning, generally speaking, operates on the basis of making connections between only a few variables at a time, so a tree or a list with thousands of features and relationships will be significantly harder to follow and thus less interpretable. In these more complex cases, an interpretable model may lose much of its global as well as its local explainability.

Similar advantages and disadvantages have long been recognised in the explainability of regression-based models. Clear-cut interpretability has made this class of algorithmic techniques a favoured choice in high-stakes and highly regulated domains because many of them possess linearity, monotonicity, and sparsity/ non-complexity:

Characteristics of regression-based models that allow for optimal explainability and transparency

  • Linearity: Any change in the value of the predictor variable is directly reflected in a change in the value of the response variable at a constant rate. The interpretable prediction yielded by the model can therefore be directly inferred from the relative significance of the parameter/ weights of the predictor variable and have high inferential clarity and strength.
  • Monotonicity: When the value of the predictor changes in a given direction, the value of the response variable changes consistently either in the same or opposite direction. The interpretable prediction yielded by the model can therefore be directly inferred. This monotonicity dimension is a highly desirable interpretability condition of predictive models in many heavily regulated sectors, because it incorporates reasonable expectations about the consistent application of sector specific selection constraints into automated decision-making systems.
  • Sparsity/ non-complexity: The number of features (dimensionality) and feature interactions is low enough and the model of the underlying distribution is simple enough to enable a clear understanding of the function of each part of the model in relation to its outcome.

In general, it is helpful to get to know the range of techniques that are available for building interpretable AI models such as those listed above. These techniques not only make the rationale behind AI models readily understandable; they also form the basis of many of the supplementary explanation tools that are widely used to make ‘black box’ models more interpretable.

Technical strategies for explaining ‘black box’ AI models through supplementary explanation tools

If, after considering domain, impact, and technical factors, you have chosen to use a ‘black box’ AI system, your next step is to incorporate appropriate supplementary explanation tools into building your model.

There is no comprehensive or one-size-fits-all technical solution for making opaque algorithms interpretable. The supplementary explanation strategies available to support interpretability may shed light on significant aspects of a model’s global processes and components of its local results.

However, often these strategies operate as imperfect approximations or as simpler surrogate models, which do not fully capture the complexities of the original opaque system. This means that it may be misleading to overly rely on supplementary tools.

With this in mind, ‘fidelity’ may be a suitable primary goal for your technical ‘black box’ explanation strategy. In order for your supplementary tool to achieve a high level of fidelity, it should provide a reliable and accurate approximation of the system’s behaviour.

For practical purposes, you should think both locally and globally when choosing the supplementary explanation tools that will achieve fidelity.

Thinking locally is a priority, because the primary concern of AI explainability is to make the results of specific data processing activity clear and understandable to affected individuals.

Even so, it is just as important to provide supplementary global explanations of your AI system. Understanding the relationship between your system’s component parts (its features, parameters, and interactions) and its behaviour as a whole will often be a critical to setting up an accurate local explanation. It will also be essential to securing your AI system’s fairness, safety and optimal performance. This will help you provide decision recipients with the fairness explanation and safety and performance explanation.

This sort of global understanding may also provide crucial insights into your model’s more general potential impacts on individuals and wider society, as well as allow your team to improve the model, so that you can properly address concerns raised by such global insights.

In Annexe 3 we provide you with a table containing details of some of the more widely used supplementary explanation strategies and tools, and we highlight some of their strengths and weaknesses. Keep in mind, though, that this is a rapidly developing field, so remaining up to date with the latest tools will mean that you and technical members of your team need to move beyond the basic information we are offering there. In Annexe 3 we cover the following supplementary explanation strategies:

Local supplementary explanation strategies

Global supplementary explanation strategies

Individual Conditional Expectations Plot (ICE) Partial Dependence Plot (PDP)
Sensitivity Analysis and Layer-Wise Relevance Propagation (LRP) Accumulated Local Effects Plot (ALE)
Local Interpretable Model-Agnostic Explanation (LIME) and anchors Global Variable Importance
Shapley Additive ExPlanations (SHAP) Global Variable Interaction
Counterfactual Explanation  
Surrogate models (SM) (Could also be used for global explanation)  
Self-Explaining and Attention-Based Systems (Could also be used for global explanation)  

Combining and integrating supplementary explanation strategies

The main purpose of using supplementary explanation tools is to make the underlying rationale of the results both optimally interpretable and more easily intelligible to those who use the system and to decision recipients.

For this reason, it is a good idea to think about using different explanation strategies together. You can combine explanation tools to enable affected individuals to make sense of the reasoning behind an AI-assisted decision with as much clarity and precision as possible.

With this in mind, it might be helpful to think about how you could combine these different strategies into a portfolio of tools for explanation extraction.

Keeping in mind the various strategies we have introduced in the table in Annexe 3, there are three significant layers of technical rationale to include in your portfolio:

  • visualise how the model works;
  • understand the role of variables and variable interactions; and
  • understand how the behaviours or circumstances that influence an AI-assisted decision would need to be changed to change that decision.

Here are some questions that may assist you in thinking about how to integrate these layers of explanation extraction:

  • Visualise how the model works
    • How might graphical tools like ALE plots or a combination of PDP’s and ICE plots make the logic behind both the global and the local behaviour of our model clearer to users, implementers, auditors and decision recipients? How might these tools be used to improve the model and to ensure that it operates in accordance with reasonable expectations?
    • How can domain knowledge and understanding of the use case inform the insights derived from visualisation techniques? How might this knowledge inform the integration of visualisation techniques with other explanation tools?
    • What are the most effective ways that such visualisations can be presented and explained to users and decision recipients so as to help them build a mental model of how the system works, both as a whole and in specific instances? How can they be used to enhance evidence-based reasoning?
    • Are other visualisation techniques available (like heat maps, interactive querying tools for ANN’s, or more traditional 2D tools like principle components analysis) that would also be helpful to enhance the interpretability of our system?
  • Understand the role of variables and variable interactions
    • How can global measures of feature importance and feature interactions be utilised to help users and decision recipients better understand the underlying logic of the model as a whole?
    • How might they provide reassurance that the model is yielding results that are in line with reasonable expectations?
    • How might they support and enhance the information being provided in the visualisation tools?
    • How might measures of variable importance and interaction effects be used to confirm that our AI system is operating fairly and is not harming or discriminating against affected stakeholders?
    • Which local, post-hoc explanation tools - like LIME, SHAP, LOCO (Leave-One-Covariate-Out), etc- are reliable enough in the context of our particular AI system to be useful as part of its portfolio of explanation extraction tools?
    • Have we established through model exploration and testing that using these local explanation tools will help us to provide meaningful information that is informative rather than misleading or inaccurate? 
  • Understand how the behaviours or circumstances that influence an AI-assisted decision would need to be changed to change that decision
    • Are counterfactual explanations appropriate for the use case of our AI application? If so, have alterable features been included in the input space that can provide decision recipients with reasonable options to change their behaviour in order to obtain different results?
    • Have we used a solid understanding of global feature importance, correlations, and interaction effects to set up reasonable and relevant options for the possible alternative outcomes that will be explored in our counterfactual explanation tool?