Annex A: Glossary

Here is a glossary of terms used in this guidance. This glossary includes terms referred in the Cabinet Office’s Data Ethics Framework: glossary and methodology and the Code of Practice for Statistics.


A set of step-by-step instructions. Computer algorithms can be simple (if it’s 3pm, send a reminder) or complex (identify pedestrians).

(Source: Matthew Hutson (2017) ‘AI Glossary: Artificial Intelligence in so many words’)

Artificial Intelligence (AI)

AI can be defined as the use of digital technology to create systems capable of performing tasks commonly thought to require intelligence. AI is constantly evolving, but generally it:

  • involves machines using statistics to find patterns in large amounts of data
  • is the ability to perform repetitive tasks with data without the need for constant human guidance

(Source: GDS, OAI (2019) ‘A guide to using artificial intelligence in the public sector’)

Code of Practice

Shortened version of the Code of Practice for Statistics. Not to be confused with computer ‘code’, which relates to a collection of instructions that can be executed by a computer to perform a specific task.

(Source: Code of Practice for Statistics)


In general, data can be understood as discrete values and statistics collected together for reference or analysis.

When we refer to data, we mean both data about people generated through their interactions with services, and also data about systems and infrastructure such as businesses and public services. Data can be operational (collected in the process of running services or businesses), as well as analytical and statistical.

(Source: National Data Strategy 2020).

Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

(Source: ICO, GDPR).

Data Science

Data science describes analysis using automated methods to extract knowledge from data. It covers a range of techniques, from finding patterns in data using traditional analytics to making predictions with machine learning.

(Source: Data Ethics Framework 2018).

Machine Learning

Machine learning is a subset of AI, and refers to the development of digital systems that improve their performance on a given task over time through experience. Machine learning is the most widely-used form of AI, and has contributed to innovations like self-driving cars, speech recognition and machine translation.

(Source: GDS, OAI (2019) ‘A guide to using artificial intelligence in the public sector’).


A model, in computing terms is a physical, mathematical, or otherwise logical representation of a system, entity, phenomenon, or process.

(Source: US Department of Defence).

The word covers a broad range of uses. It may be used to refer to an entire system, such as a statistical model, mathematical model or conceptual model, or a component part of an AI system trained on a set of data, i.e. an AI model.

Official Statistics

Statistics produced by crown bodies, those acting on behalf of crown bodies, or those specified in statutory orders, as defined in section 6 of the Statistics and Registration Service Act 2007.

(Source: Code of Practice for Statistics)

Public Good

Defined in the Statistics and Registration Service Act 2007 in terms of the Authority’s statutory objective to promote and safeguard the production and publication of official statistics that serve the public good. This includes informing the public about social and economic matters; assisting in the development and evaluation of public policy; and regulating quality and publicly challenging the misuse of statistics.

(Source: Code of Practice for Statistics)


A collection of measures about a particular attribute compiled from a set of data. Statistics are used for making generalisations or inferring conclusions about particular attributes, at an aggregate level, for example, about a particular subset of the population.

(Source: Code of Practice for Statistics)

Traditional statistical techniques

A set of techniques that are considered traditional/classic in the field of statistics. Techniques include: the univariate t-test, ANOVA, ANCOVA, Pearson’s r, Spearman rho, phi, and multiple regression; the multivariate Hotelling’s T2, MANOVA, MANCOVA, descriptive discriminant analysis, and canonical correlation analysis.

(Source: Bruce Thompson (2013) ‘Overview of Traditional/Classical Statistical Approaches’)

Back to top
Download PDF version (346.69 KB)