In this guest blog, Professor Sir David Spiegelhalter, Emeritus Professor of Statistics at the University of Cambridge, reflects on his experiences in the Infected Blood Inquiry and the importance of transparency around statistical uncertainty.

In my latest book, The Art of UncertaintyI discuss the UK Infected Blood Inquiry as a case study in communicating statistical uncertainty. In the 1970s and 1980s, tens of thousands of people who received contaminated blood products contracted diseases including HIV/AIDS and hepatitis. Many died as a result. This crisis, with its catastrophic consequences, was referred to as ‘the worst treatment disaster in the history of our NHS’.

The Infected Blood Inquiry was set up in 2018 after much campaigning by victims and their families. I was involved in the Statistics Expert Group established as part of the Inquiry.

Building a model for complex calculations

Our group was tasked with answering a number of questions surrounding the events, such as how many people had been infected with hepatitis C through contaminated blood transfusions.

Some conclusions were relatively easily reached. We could be reasonably confident in data and its verification, such as that around 1,250 people with bleeding disorders were diagnosed with HIV from 1979 onwards.

Other figures proved much more difficult to estimate, such as the number of people receiving ordinary blood transfusions who were infected with hepatitis C, before testing became available. We needed a more sophisticated approach that did not involve counting specific (anonymous) individuals but looked at the process as a whole. Consequently, we established a complex statistical model to derive various estimates. However, due to the lack of data available for some parts of the model, expert judgement was at times necessary to enable it, so we had to account for multiple sources of uncertainty.

Using this model, we were able to produce numbers that went some way to answering the questions we were charged with. However, some figures came with very large uncertainty due the inherent complexity involved in their calculation, so we could not be reliably sure of their accuracy.

A scale for communicating uncertainty

To prevent people from placing undue trust in our findings, we wanted to express the considerable caution that should be taken when considering our analysis. For this, we found the scale used in scientific advice during the COVID-19 pandemic to be a helpful model, in which confidence is expressed in terms of low through to high.

This scale was liberating; it allowed us to clearly convey our level of confidence in a way that accurately reflected the reality of the numbers. So, we could say that we only had moderate confidence that the available data could answer some of the questions we had been asked. And for others – for example, how many people had been infected with hepatitis B – we refused to provide any numbers, on account of having low confidence in being able to answer the question.

Lessons for the statistical community about communicating uncertainty

It can be difficult to admit to substantial uncertainty in data when dealing with a tragedy such as this. In the case of the Infected Blood Inquiry, this lack of clarity meant that the victims and their families were unable to have answered, in any precise way, various questions for which they deserved some kind of closure.

It is also undeniably important, however, that those producing statistics are open about how confident they are in their numbers, so that people understand when statistics can reliably answer their questions, and when they cannot. Indeed, being transparent about any uncertainty in published data is one of the principles that the Office for Statistics Regulation (OSR) promotes in its intelligent transparency campaign and its championing of analytical leadership to support public understanding of, and confidence in, the use of numbers by government.

Intelligent transparency demands that statistical claims and statements are based on data to which everyone has equal access, are clearly and transparently defined, and for which there is appropriate acknowledgement of any uncertainties and relevant context. This concept helps us understand how to communicate our findings when we are asked to answer questions regardless of the quality of available evidence. And it acknowledges that publishing numbers without appropriate context, clarifications and warnings is counterproductive to providing real public value.

So, when it comes to communicating statistics to the public, honesty – or transparency, as we call it here – really is the best policy. I am delighted to see OSR placing more emphasis on intelligent transparency, and how statistics are communicated more generally, in its proposals for a refreshed Code of Practice. Ed Humpherson has also written an excellent blog on why communicating uncertainty is a constant challenge for statisticians.