Approaches to presenting uncertainty in the statistical system

Introduction

1.1 Why uncertainty matters

The aim of publishing statistics is to provide insight into the broad range of society’s questions. As part of this, these statistics should be useful for their intended purpose. Uncertainty exists in statistics in various forms – for example because of lags in administrative data systems, or through limitations in data collected through sample surveys. An important part of ensuring the appropriate use of statistics, as guided by the Code of Practice for Statistics, is to make it clear that uncertainty exists in the statistics, so that users can avoid drawing inappropriate inferences from the statistics. This requires statisticians to understand and calculate (where possible) measures of uncertainty and to communicate them in a way that can be easily understood by a potentially diverse group of users. Depending on the context, this may be done through appropriate use of language, or simple narrative descriptions of quality, right through to presenting a very detailed quantification of uncertainty. It is especially important to describe uncertainty where there are changes in the quality of the statistics over time, for example as a result of new methods or because of changed data collection approaches, such as the changes we saw during and since the Covid-19 pandemic. This is an area that we will be focussing on further over the coming months. We consider uncertainty as we review statistics as part of our regulatory work but this is the first time we have taken a retrospective look at our work on uncertainty over the past few years to try to draw broader insights.

1.2 What is uncertainty?

Uncertainty has many facets. The Winton Centre for Risk and Evidence Communication at the University of Cambridge distinguishes between direct and indirect uncertainty. Direct uncertainty is where you are expressing your uncertainty about the estimate (or fact), without taking account of any of the caveats that may exist around the way that the data were collected. Indirect uncertainty refers to the uncertainty in terms of the quality of the underlying knowledge that surrounds a claim about a fact, number or hypothesis. This will often be communicated as a list of caveats about the underlying sources of evidence or it can sometimes be summarised into a qualitative or ordered categorical scale such as the GRADE scale for communicating the quality of underlying evidence about the effects of medical interventions. For example, taking the estimate of net migration in the UK in 2021, the direct uncertainty could be summarised by including a range of values within which the true value is expected to lie, assuming a representative sample has been taken. The indirect uncertainty would talk about how those data have been collected, which groups may be missing or are likely to be under-reported and so on.

It can also be helpful to split uncertainty into narrow and broad uncertainty. Narrow uncertainty concerns a specific claim about a defined quantity. It comprises both quantifiable statistical error and (usually unquantifiable) systematic biases due to data limitations. Its expression may take the form of quantified measures where available, or use of words such as “about”. Broad uncertainty relates to the relevance of the number to the wider question of interest. This may take the form of caveats due to ambiguity of terms, a particular metric being a limited measure of the thing of interest, or data simply not being available.

1.3 Communicating uncertainty

Communicating uncertainty is not necessarily an easy task. The relevant aspects of uncertainty need to be presented in a concise and straightforward way that facilitates easy interpretation but doesn’t swamp the statistical messages or paint an overly negative view about the statistics themselves. There are particular challenges in communicating uncertainty in statistical tables (particularly those that are user-defined) and in raw data files. Statistical producers will also likely have varying degrees of understanding of the different aspects of uncertainty, and have to target descriptions at a potentially diverse range of users from different backgrounds and with varying uses of statistics in mind.

At a very basic level, it needs to be very clear when numbers are being presented as estimates. The prominence and visibility of statements on uncertainty is another key consideration. There is less value in having a strong quantified statement about quality, for example, if it is not readily accessible to users.

Descriptions of the uncertainties around estimates can be present in any of a producer’s published outputs or online interactive tools. Often, this information can be found in the quality and methods documents. But some indication of uncertainty is also needed in statistical reports, data tables, interactive maps, data dashboards and downloadable datasets to ensure that all users accessing the information understand how to use the statistics appropriately. These challenges are important for producers to address to ensure that their statistics are used correctly – particularly to ensure that any false conclusions are not drawn from the statistics.

Our approach to evaluating the way that uncertainty is described uses the two axes of “what is said” and “where it’s said” to help think about whether information about uncertainty is adequate. Information about uncertainty needs to be both helpful, and presented in such a way that it is accessible to those using the statistics or data. More details on our approach are included in Annex A.

1.4 Aims of the project

In this work we have drawn together what we know about existing guidance and practice across government for communicating uncertainty, along with insights from our own regulatory work. The aim has been to provide a range of examples of good practice to support statistical producers, and to help us improve the way that we regulate. This is only an initial exploration into the topic, and following this analysis, we will work with others to enhance existing guidance where possible and then promote the outcomes to the Government Statistical Service (GSS) and the wider Analysis Function.

We want producers to be equipped to be able to measure and evaluate uncertainty in their statistics. We also want them to have a framework, guidance and good practice examples to be able to use in considering the implications of uncertainty on the use of their statistics, and then to communicate that uncertainty to enhance the use and reduce the potential for misuse of their statistics.

Further details on our approach in gaining insight into our work on uncertainty is covered in Annex B.

The remainder of this report outlines some of the tools and resources that are currently available on uncertainty and then goes on to look at what we discovered from looking across our published work over the last few years. We include a range of case studies.

« Previous

Download PDF version (604.32 KB)