Glossary

Term	Meaning
Affinity groups	Groups created on the basis of inferred interests rather than the personal traits of the individuals comprising them. They are also often described as “cohorts” or “ad-hoc groups”.
AI development tools	A service that allows clients to build and run their own models, with data they have chosen to process, but using the tools and infrastructure provided to them by a third-party.
AI prediction as a service	A service that provides live prediction and classification services to customers.
Algorithmic fairness	An active field of research that involves developing mathematical techniques to measure how ML models treat individuals from different groups in potentially discriminatory ways and reduce them.
Algorithmic fairness constraints	These are constraints you put in place while training a model in order to embed algorithmic fairness into its objective function.
Application Programming Interface (API)	A computing interface which defines interactions between multiple software intermediaries.
Automation bias	Where human users routinely rely on the output generated by a decision-support system and stop using their own judgement or stop questioning whether the output might be wrong.
Bias mitigation algorithms	Processes to remove unwanted bias in data or models.
Black box	A system, device or object that can be viewed in terms of its inputs and outputs, without any knowledge of its internal workings.
Black box attack	Where an attacker has the ability to query a model and observe the relationships between inputs and outputs but does not have access to the model itself.
Black box problem	The problem of explaining a decision made by an AI system, which can be understood by the average person.
Causality	The principle that one variable (X) - an independent variable - produces change in another variable (Y) which is called a dependent variable. To establish causation, the two variables must be associated or correlated with each other and non-causal, ‘spurious’ explanations for the relationship must be eliminated. Depending on the context and because events in the real world are too complex to be explained just by one causal relationship, the principle of multiple causation needs to be considered, which says that a combination of causal relationships are more often than not in operation.
Confidence interval	A range of values that describes the uncertainty surrounding an estimate for an unknown parameter - the variable your AI system is trying to predict.
Concept/model drift	Where the domain in which an AI system is used changes over time in unforeseen ways leading to the outputs becoming less statistically accurate.
Constrained optimisation	A number of mathematical and computer science techniques that aim to find the optimal solutions for minimising trade-offs in AI systems.
Correlation	The relationship between two variables, where we can predict one variable from the other.
Cost function	An aspect of a learning process which attaches a cost to certain kinds of behaviours (eg errors) to help with achieving its objective function.
Dataset labellers	Individuals that label the training data so that an ML algorithm can learn from it.
Decision space, construct space and observed space	These spaces denote levels of examining a problem: the construct space relates to unobservable variables; the observed space relates to observed features; and the decision space is the set of actions available to a decision-maker.
Decision boundary	A threshold that separates data into different classes. For example, the boundary that separates loan applicants that will be rejected from those that will be accepted.
Decision tree	A model that uses inductive branching methods to split data into interrelated decision nodes which end in classifications or predictions. Decision trees move from starting ‘root’ nodes to terminal ‘leaf’ nodes, following a logical decision path that is determined by Boolean-like ‘if-then’ operators that are weighted through training.
Deep learning	A subset of machine learning where systems ‘learn’ to detect features that are not explicitly labelled in the data.
Differential privacy	A system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.
Downstream effects	The impact(s) of an AI system on individuals once it is deployed.
False negative (‘type II’) error	When an AI system incorrectly labels cases as negative when they are positive.
False positive (‘type I’) error	When an AI system incorrect labels cases as positive when they are negative.
Feature selection	The process of selecting a subset of relevant features for in developing a model.
Federated learning	A technique which allows multiple different parties to train models on their own data (‘local’ models). They then combine some of the patterns that those models have identified into a single, more accurate ‘global’ model, without having to share any training data with each other.
Ground truth	At a high level, the reality a model intends to predict.
Inductive bias	The assumptions an algorithm is built on. It plays a role in the ability of a model to generalise when faced with new data.
In-processing	A series of techniques to intervene in the model during its training process, such as by adding additional constraints or regularization terms to its learning process.
Hyperparameters	Hyperparameters are configurations of parameters of an algorithm or learning process. Parameters are rules, constraints or assumptions that developers provide to an algorithm for training in order to deliver a functioning model for their specific use case. Hyperparameters are configurations of these parameters.
‘K-nearest neighbours’ (KNN) models	An approach to data classification that estimates how likely a data point is to be a member of one group or the other depending on what group the data points nearest to it are. KNN models contain some of the training data in the model itself.
Lack of interpretability	An AI system which has outputs that are difficult for a human reviewer to interpret.
Local Interpretable Model-agnostic Explanation (LIME)	An approach to low interpretability which provides an explanation of a specific output rather the model in general.
Machine learning (ML)	The set of techniques and tools that allow computers to ‘think’ by creating mathematical algorithms based on accumulated data.
Membership inference attack	An attack which allows actors to deduce whether a given individual was present in the training data of a machine learning model.
Model inversion attack	An attack where attackers already have access to some personal data belonging to specific individuals in the training data, but can also infer further personal information about those same individuals by observing the inputs and outputs of the machine learning model.
Multi-criteria optimisation	A mathematical approach trying to satisfy multiple criteria in the process of decision-making.
Objective function	The goal that a machine learning algorithm is trying to achieve (eg ‘minimise errors’).
Perturbation	Where the values of data points belonging to individuals are changed at random whilst preserving some of the statistical properties of those features in the overall dataset.
Post-processing bias mitigation	A series of techniques applied to the initial model after its original training.
Precision	The percentage of cases identified as positive that are in fact positive (also called ‘positive predictive value’).
Pre-processing	The process of transforming data prior to using it for training a statistical model.
Privacy enhancing technologies (PETs)	A broad range of technologies that are designed for supporting privacy and data protection.
Programming language	A formal language comprising a set of instructions that produce various kinds of outputs that are using in computer programming to implement algorithms.
Query	A request for data or information from a database table or combination of tables.
Recall (or sensitivity)	The percentage of all cases that are in fact positive that are identified as such.
Redundant encodings	Patterns encoded in complex combinations of features.
Regularisation	A method to reduce overfitting to training data, particularly when the training data is scarce or known to be incomplete.
Reward function	The term is used within the context of a reinforcement learning model where you provide a reward if it is able to learn and penalise it by imposing a cost function.
Statistical accuracy	The proportion of answers that an AI system gets correct.
Supervised machine learning	A machine learning task of learning a function that maps an input to an output based on examples of correctly labelled input-output pairs.
Support Vector Machines (SVMs)	A method of separating out classes by using a line (or hyperplane) to divide a plane into parts where each class lay in either side.
Upstream effects	The impact(s) of an AI system on individuals at the early stage(s) of its development.
Use case	An AI application or the problem an AI system intends to solve.
Target variable	The outcome that an AI system seeks to predict.
Variance	The extent to which a model is overfitted to the data it is trained on. High variance means the model is more likely to fail when presented with new examples that are different from the training data. Ultimately, variance is used to understand how reliable a model is in its performance.
‘Virtual machines’ or ‘containers’	Emulations of a computer system that run inside, but isolated from the rest of an IT system.
‘White box’ attack	Where an attacker has complete access to the model itself, and can inspect its underlying code and properties. White box attacks allow additional information to be gathered (such as the type of model and parameters used) which could help an attacker infer personal data from the model.