It’s beginning to look a lot like Census…

It may be too early to talk about Christmas for some – not for me. I have decided on the design theme, got my advent calendar, am close to finalising the Christmas menu and have started ordering gifts. I am all over it! And getting more excited by the day. 

Christmas means different things to different people, but it is certainly my favourite census related celebration. Weren’t Mary and Joseph off to Bethlehem to register as part of a census? Timely then as part of my, possibly a bit too keen, Christmas preparations, I am pleased to say we have published our phase 2 assessment reports for the 2021 Census in England and Wales and the 2021 Census in Northern Ireland. 

If you read my previous blog, you’ll know I have been leading OSR’s work on the Censuses in the UK and today is a bit of a milestone moment for me as part of this assessment. The publication of these reports is a culmination of a range of work which kicked off three or four years ago and it has been so interesting and rewarding to speak with users and stakeholders of Census data and statistics throughout –  it’s been an absolute gift!  

Our reports recognise the efforts of the Office for National Statistics and the Northern Ireland Statistics and Research Agency in delivering Census live operations and their continuing work to produce high quality, extremely valuable Census data and statistics. I wanted to take the opportunity to specifically thank all of the individuals and teams who have taken the time to engage with me and my colleagues as part of this assessment process. You have been open with us, kept us up to date with developments and have taken on board our feedback throughout the process. All the while getting on with the more important job at hand, working on Census itself. 

As we get closer to the festive season, I wish you all a well-deserved break and raise a glass of sparkling elderflower pressé to one and all.  

Related links:

Assessment of compliance with the Code of Practice for Statistics – 2021 Census in England and Wales

Assessment of compliance with the Code of Practice for Statistics – 2021 Census in Northern Ireland

Volunteering at COP26 – being part of something big

I felt immensely excited and proud to be one of the 1,000 volunteers helping at COP26 in my home city of Glasgow. Volunteers came from different corners of the globe, to perform a range of tasks. I was a shuttle hub volunteer which means I helped those going to COP26 get from there and back by bus shuttle from the key transport hubs – over 7,000 people on one day in the hours I was there.

What gets me out of bed in the morning is feeling like being part of a greater whole. COP26 makes me realise that climate change and net zero is the bigger picture. As VIPs with their outriders whizzed by me, I was focussed on people keen to get their day going, like the lady who travelled for days from the islands of Palau or the man from Samoa to make sure they get their voices heard alongside the more well-known world leaders. The Volunteers logged the peoples from different parts of the world that they’ve welcomed to Glasgow – now covering almost all the 196 nations at COP26.

Glaswegians had varied responses to COP26. Most saw it as a great opportunity for the world. Some reflected the views of Greta Thunberg, that it’s more about “blah blah blah” than substance. Some bemoaned the inconvenience to their daily routine, although Glaswegians are renowned for their humour and make a joke of any temporary disruption to their day.

Coming from a mining family, I started my career in the coal industry, and was enormously proud to work in a nationalised industry. A phrase I heard a lot back then was that Britain was an island founded on coal surrounded by fish.

I was one of 20 young graduates that joined the National Coal Board (NCB) in 1979 and many, like me, came from coalfield communities. Early in 1980, I was sent on behalf of the NCB’s Economics Intelligence Unit, to monitor the Select Committee on Energy at the House of Commons to feed back on its inquiry into alternative sources of energy. I reported back that I didn’t see renewables taking off or being any competition to coal. I was deeply attached to coal as a nationalised resource. The prevailing economic thinking at the time was the mixed economy – nationalised industries, public corporations and private enterprise.

I so wish now that I could have heard statistics about the threat to the planet (those statistics came much later) and the imperative of reducing our dependency on fossil fuels and ramping up clean alternatives. At that time in the early 1980s the environmental problem of burning 120m tonnes of coal in the UK was making coal a little cleaner. I was astonished to learn that more coal is produced in the world today (~7.6 billion tonnes) than early 1980 (~3.8 billion tonnes), according to statistics from the International Energy Agency.

Climate change statistics are important, but change will rely on leadership

During these weeks at COP26, I saw and heard the leaders and politicians on the TV radio and through the media. As I spoke with those going to COP26, I heard their hopes and fears. I realised that climate change is more than the science; political leadership is crucial. Leaders need to be skilled in the appropriate use of data as part of their role in persuading the public to make the behavioural changes and embrace the consequences. E M Forster said “Only Connect!…Only connect the prose and the passion and both will be exalted”.  Leaders connect the prose and the passion and need to skilfully deploy the statistics. Leaders can give statistics a social life.

As we’ve learned from the Covid-19 pandemic, the role of senior leaders in government is also vital in providing support to statisticians. The lessons learned throughout the pandemic should show us how to better help people to understand the choices open to us all to respond to the challenges of net zero.

Statistics play a key role in helping the public understand the bigger picture on climate change. They are also essential for helping governments design and monitor policies that reduce or prevent greenhouse gas emissions (mitigation) and prepare us for the expected impacts of climate change (adaptation). Because it’s such a complex topic, climate change statistics add insight when related sets of statistics are brought together to tell a clear story of what is happening and why. They also add value when they are presented in ways that help different types of users understand what the data is telling us, and when users can access the information they need. Our recent review of the UK’s climate change statistics looked at exactly these two things: their coherence and accessibility.

I’m nostalgic for people in Britain as a society having a concept of being part of a greater whole, that realises we’re dependent on each other. We need to connect with each other locally and globally and we’ll appreciate being part of something much bigger than ourselves.

 

 

Will public interest in statistics last?

We recently hosted a lively public event where we engaged with a varied audience answering questions about our work during the pandemic. We are delighted that the public is taking such a keen interest in statistics, data and evidence.

We work hard to ensure that numbers needed and consumed by the public uphold the scrutiny and standards of the Code of Practice for Statistics, offering ‘trustworthiness, quality and value’ to anyone using them.

Before the pandemic, our relationship with the general public (endearingly referred to here as ‘armchair epidimiologists’) was predominantly through third-party intermediaries like the media, government statistical producers, specialist user groups, academia, think-tanks and the third-sector to name but a few.

Understanding of the importance of statistics and public interest in them has changed, with people embracing statistics and realising how others can use or potentially misuse them.

This is shown in the number of cases we have considered; In the period from April 2020 to March 2021, we considered nearly three times the number of cases than in the previous year. 76% of the cases we looked into were related to the pandemic, and 48% of cases related to quality, reliability and trustworthiness of statistics – the first time this has been the most common category. This makes what we do much simpler to explain, as we continue to challenge the use of any official statistics that raise issues or cause concern.

We see statistics as an asset that frame our understanding of the world, help inform our choices and as a starting point for debate. This could be on the size of the economy, the number of people in the country, the rate of crime, the health and well-being of the population, and on more specific topics such as the levelling up agenda, or statistics on climate change.

But this isn’t just for institutional decision makers – like the Bank of England, or a Secretary of State. Statistics also support the choices made every day by a very wide range of people; individuals, businesses, community groups and so on.

We are considering how we should go beyond intermediaries and engage more directly than we do already with the general public on issues we care about. As part of our ongoing work and commitment as statistics regulator, we encourage you follow our twitter, read our newsletter, visit our website and contact us with any thoughts or questions you might have.

Communicating data is more than just presenting the numbers

There has been a lot of talk about a UK Health Security Agency (UKHSA) technical report. It includes information on COVID-19 case rates in England for vaccinated and unvaccinated groups (Table 5). For some the immediate reaction to these data has been outright disbelief, others have used the data to support pre-existing, and incorrect, views that vaccines are not effective. Neither of these reactions is right. Understanding the issues properly is extremely complex, but what we do know with some certainty, is that while the vaccine will not stop the spread of the virus completely, it has been shown to help improve outcomes.

We have seen considerable improvements to the presentation of the data in the latest UKHSA report, which should support better interpretation of these data in future. However, it provides an important lesson about the harm that can be done if data are not clearly presented and well explained. There is more to communicating data than just presenting the numbers. Producers of data should do everything they can to minimise the potential for misinterpretation or misuse.

As well as presenting data clearly, producers need to guide users on how the data they publish can, and perhaps most importantly cannot, be used. They also need to explain the choices that are made in producing the analysis, along with the implications of these choices. Producers of data are the experts in the data they publish, their insight and understanding of these data can help ensure its appropriate interpretation and use.

To return to the example of COVID-19 case rates by vaccination status, the choices made by UKHSA have been a source of much debate because the consequences of the choices are so significant. It is important these choices and their implications are explained, and it is perhaps even more important that the guidance on interpretation of the data is clear. As an organisation of experts, UKHSA is in a position to help the public understand what the data mean, including explaining why the findings may not be as counter intuitive as they first appear while leaving no uncertainty around its views that vaccines are effective. UKHSA can help explain where there is value in using the data (e.g. looking at changes in case rates over time or across age bands, within the vaccinated group) and where there is not (e.g. understanding vaccine effectiveness).

Guidance on interpretation

The report is clear that a direct comparison of rates for vaccinated and unvaccinated is not appropriate. The report opens with the text:

These raw data should not be used to estimate vaccine effectiveness as the data does not take into account inherent biases present such as differences in risk, behaviour and testing in the vaccinated and unvaccinated populations.”

This is a helpful health warning. By thinking about questions people may have and how they may try to use data, producers can pre-empt potential issues and provide explanation that will support appropriate interpretation. In the context of COVID-19 case rates some of the key questions might be:

  • What are the data measuring? And is this what people want to know about? Many people want to know the risk of COVID-19 infection for those who are double vaccinated compared with those who have not had a vaccine. The data in UKHSA’s Table 5 do not do this. It does not show infection rates in the population. The table shows case rates i.e. the rate of COVID-19 positive tests in those who come forward for testing, something the government continues to ask people to do[1]. As a result, the data may have inherent biases.
  • Are the two groups in question comparable? It is easy to see that there may be different behaviours in people who have had two vaccinations compared to those who have had none. One hypothesis is that those with two vaccinations are more likely to get tested, meaning the case rates will look relatively higher for this group compared to the unvaccinated group. There will also be different risks associated with each group, the vaccination programme prioritised vulnerable groups and frontline health and social care workers, so includes those who are more at risk of infection. We haven’t seen evidence to quantify the impact of these risks and behaviours, but it’s likely there will be an impact.
  • Are there other sources of data which could be considered? There are increasingly other sources of information which demonstrate vaccines are highly effective. The UKHSA has done a significant amount of research and analysis into vaccines. This is outlined in the vaccine surveillance report, which sets out effectiveness against different outcomes (infection, symptomatic disease, hospitalisation and mortality). There is further information via UKHSA’s monitoring of the effectiveness of COVID-19 vaccination In addition, the Office for National Statistics (ONS) has published an article on the impact of vaccination on testing positive as well as an article on deaths involving COVID-19 by vaccination status. All of these examples take into account characteristics of those in the sample and try to adjust for differences. As a result, they offer a better indication of vaccine effectiveness.

Implications of choices

To undertake any analysis choices need to be made. These choices and their implications should be explained.

In the calculation of COVID-19 case rates the most significant choice is the decision on what population estimates to use to calculate the rates (“the denominator”). There are two obvious choices: the National Immunisation Management Service (NIMS) or the ONS mid-year population estimates. Each source has its strengths and limitations and we don’t yet know the true figure for the denominator.

In the context of case rates the choice of denominator becomes even more significant than when it is used for take up rates, because the numbers of people with no vaccine are low. The population of people vaccinated is relatively well known, NIMS includes all those who have been vaccinated and is a good estimate of this population.

The difficulty comes in understanding the total number of the population who have not had a vaccination. There are many advantages to using NIMS, not least because it is consistent with international approaches to considering immunisations and allows for analysis which would not be possible using aggregate population estimates. However, we also know that NIMS overestimates the population. Similarly, there are strengths in using ONS mid-year estimates, but we know these can have particular difficulties for small geographic breakdowns. We also know that the time lag created by using mid-year 2020 estimates has a disproportionate impact in older age groups – for example, it means that in more granular age bands some older age groups show more people having been vaccinated than the ONS population suggests exist. There is more information on the strengths and weaknesses of each source in NHSE&I’s notes on denominators for COVID-19 vaccination statistics. The chief statistician at Welsh Government has published a blog which outlines the impact of the different choices for vaccine uptake in Wales.

Looking just at the adult population, Figure 1 shows the different results which come from using the two different denominator options for the population who have never had a COVID-19 vaccine.

 

Figure 1: COVID-19 case rates per 100,000, England by age band

This table shows covid 19 case rates per 100,000 people in England by different age groups

*see notes at end for the data and more details of sources used in this figure.

Using the NIMS denominator, the positive case rates for those who are not vaccinated is below the case rate for those who are vaccinated (except in the 18-29 age band). Using the ONS mid-year 2020 estimates as a denominator, the positive case rates for those who are not vaccinated is higher than for those who are vaccinated (in all age groups below 80). While we don’t yet know the true figure for the unvaccinated population, this seemingly simple choice has a huge impact. It is particularly problematic in this circumstance because any error in the total population estimate is applied in its entirety to the unvaccinated population.

As an example, for the 70 to 79 population, the NIMS figure is just 4% higher than the ONS mid-year estimates (5.02 million and 4.82 million respectively). These figures can then be used in combination with the data on total people vaccinated from NIMS to estimate the total number of people not vaccinated. In doing this, the difference of nearly 200,000 in the total population estimates is applied entirely to the relatively small number of 70 to 79 year olds who are not vaccinated. It means the NIMS estimate for the unvaccinated population in the 70 to 79 age band is 363% higher than the estimate of those not vaccinated based on the ONS mid-year estimates. So, an estimate 4% higher at the 70 to 79 age band has led to an estimate 363% higher in the estimate of the unvaccinated population at that age band. This has a huge impact on the case rates for this group, and the conclusions drawn from the data.

An understanding of the impact of choices is essential in supporting appropriate interpretation of the data. In this scenario, we don’t have enough information to know the true figure for the unvaccinated population in each age group. We hope that the work UKHSA is doing to improve the NIMS data (including removing duplicates) along with the work ONS is doing on population estimates and the 2021 Census, will improve our understanding. It is really positive that ONS and UKHSA are working together to try and solve this issue, which is so important across so many statistics. Given this uncertainty, knowledge of the implications of the different choices can help users interpret the presented data with caution.

The message from all this is that data has huge value, but it is also really important that those publishing data consider how the data may be used, what the strengths and limitations of the data are and think about how to present data in a way that minimises the potential for misuse.

In time, there will be changes to what we know about the population, and producers of analysis should not shy away from updating their outputs when new evidence comes to light. In doing this, they should also clearly explain any changes and choices, as this transparency will support trust and understanding. It will help the good data shine out and ensure statistics serve the public good.

[1] https://www.nhs.uk/conditions/coronavirus-covid-19/testing/get-tested-for-coronavirus/

 

Data, sources and notes for case rate calculations:

Table 1: COVID-19 case rates per 100,000, England by age band

Age bandRates among people vaccinated (2 doses) (NIMS)Rates among people not vaccinated - NIMSRates among people not vaccinated - ONS
18-29546.0671.31089.9
30-391084.3816.52159.1
40-491945.2834.02207.1
50-591252.1585.51907.2
60-69837.6390.71964.7
70-79636.1311.81443.7
80 or over434.1334.1312.1
  1. The calculations in Figure 1 and Table 1 are based on publicly available data. There are slight variations compared to the UKHSA data in Tables 2 and 5. For example, it is not clear exactly what date UKHSA has used for total vaccinated or unvaccinated populations, and there are regular revisions to the published data on the coronavirus dashboard, which may have caused changes since data were used in the UKHSA analysis. These differences are not big enough to impact on the conclusions in this blog.

 

  1. NIMS data from coronavirus dashboard download, downloaded 31 October 2021. Metrics downloaded: “vaccinationsAgeDemographics”. Variables used:
  • Total population = VaccineRegisterPopulationByVaccinationDate (for 24 October)
  • Total vaccinated population = cumPeopleVaccinatedSecondDoseByVaccinationDate (for 10 October i.e. 14 days before end of period, as only cases with vaccination 14 days or more before positive test are included in UKHSA table of cases)
  • Total not vaccinated = Population estimate – cumPeopleVaccinatedFirstDoseByVaccinationDate

 

  1. Total number of cases by age, and, cases by vaccination status by age, taken from COVID-19 Vaccine surveillance report week 43. UKHSA report shows total cases for four weeks to 24 October 930,013. Data downloaded (31 October) from coronavirus dashboard for four weeks up to 24 October (inclusive) gives 935,386. Additional cases likely due to cases reported between UKHSA analysis and 31 October.

 

  1. ONS population estimates based on ONS 2020 mid-year estimates taken from NHSE&I weekly vaccine statistics, 28 October 2021 (downloaded 31 October).

Six steps to make your model a role model

My favourite number is 6. It was the number Roberto Carlos, my footballing idol, wore when he played for Brazil. As a kid I remember trying to recreate his infamous banana free kick versus France (1997) over, and over again. Practice makes perfect they say. So here, in honour of Senhor Carlos, I am going to provide 6 pieces of advice from our latest publication Guidance for Models so you can all practice turning your models, into a role models. If you just want to know why we made this guidance, skip to the end of this blog. 

Every year we throw away 2.5 billion takeaway cups in the UK

Don’t create your model using the same criteria as a takeaway disposable cup: use once and throw away. A model, or components of it, should be reusable. Reusable not just by you, but by others too. It should also be robust (see testing) and adaptable (see modular). Do not make a disposable cup just for coffee, if someone else is also just for tea, when you can work together to make a sturdy cup for both tea and coffee.

Weed your borders

Like beautiful garden borders, models need maintenance. Borders may need mulching in the winter, pruning in the spring and lots of watering in the summer. Likewise, model dependencies may need updating, bugs resolving, and code changes made as a result of changes to data schemas. Teams need to plan their time accordingly to make sure the model still works and remains fit for purpose, especially as others may rely on your model’s outputs for their own model input.

Don’t be like a tragic episode of Grand Designs

We’ve all seen a Grand Designs episode when the keen, wannabe architect wants to design, build and project manage on their own dream home. It often ends in disaster: over budget, late, relationship breakups, and a lot of stress. Likewise, you need the correct people involved at the correct time to make your model a success. Model design plans should be scrutinised and checked. Experts should be consulted early on, instead of after things go wrong. Lastly, like all house builds, the model should be verified and signed off by someone you trust to make sure it is safe and secure. 

It took Jean-François Champollion years to decipher the Rosetta Stone

Model documentation should be accessible and understandable to a wide range of audiences. You should not just publish detailed, technical documentation as not everyone will be able to understand the purpose or scope of your model. This may lead to your model being misused. You should explain your model as best as you can. If the nature of the model means it is hard to explain, you should describe how users can interpret your model and its outputs at the very least. Your should open source your code if possible, and provide installation guides, code comments and examples as well. 

Be as analytical in choices as you are on Netflix

Aim to be as detailed in your model decisions as you are when choosing your next TV show to binge watch. Like choosing between a horror and a rom-com, you must understand what kind of model you need based on your scope and aims. You should seek advice and guidance from experts. Like the impossibilities of trying to stay up to date with the latest shows using a 90’s copy of the Radio Times, your decisions should be based on relevant and up to date information. Lastly, don’t overcommit and carry on if things don’t seem to be going well. Use regular checkpoints to reassess against your original needs. No one wants to force themselves through another experience like season eight of Game of Thrones. 

Be as ethically minded as you are when switching your energy supplier 

You read an article last week about the environmental benefits of veganism and want to give it a go? Great. You switched your electricity supplier to a 100% renewable electricity supplier? Way to go. You stopped going to that pub that treated its workers poorly? Power to the people! Now also understand that data, design choices and model selection all can have ethical implications. Power can be given, or taken, from certain groups based on the models we create and use. Ethics should not just be a tick box exercise; it should be the cornerstone of your model design and development.

Sure, nice analogies, but why did you actually create this guidance?

Last year the global pandemic thrust us into the limelight following a series of high profile uses of statistics for decision making. One of the biggest pieces of work we did last year was our review of the approach to developing statistical models to award 2020 exam results. “Algorithms” were blamed, with one headline stating “Dreams ruined by an algorithm” (BBC NI website). As such, we have been concerned about the threat of undermining public confidence in statistical models more broadly. 

That exam review work took us into new frontiers by commenting on the use of statistical models that influence public life, not just those that produce official statistics. But statistical models are just one range of tools used by government analysts, data scientists and statisticians. Increasingly, newer techniques such as machine learning (ML) are being tested and deployed in the production of statistics and used to inform decisions. Furthermore, with the creation of the Office for Artificial Intelligence and the Government’s National AI Strategy, we are likely to see an increased use of more advanced Artificial Intelligence (AI) techniques going forward.  

As a result, we identified this as a crucial time to provide guidance for the use of models, regardless of whether they are statistical models, machine learning models or AI models. There have been a number of publications for ethical guidance for models (Ethics, Transparency and Accountability Framework for Automated Decision-Making, Data Ethics framework) as well as the creation of the UK Statistics Authority’s Centre for Applied Data Ethics. There are also a number of technical guides on how to develop models (Aqua Book). However, we saw that there was no current guidance that suitably brought together social, ethical and technical aspects for all elements of model creation: data, design, development, delivery and deployment.  

We believe our role as a regulator, and our experience of the exam review from last year, puts us in a prime position to provide this socio-technical guidance for models. As a result, we have published our alpha release version of our model guidance with the aim to obtain feedback and comments from a wide range of users. 

If you have any feedback, please get in touch! We aim to release an updated version of the guidance in early 2022. 

Transparency: How open communication helps statistics serve the public good

Over the past 18 months we’ve talked a lot about transparency. We’ve made public interventions such as our call for UK governments to provide more transparency around COVID data, and it’s been prominent in our vision for the future of analysis in government, including in our Statistical Leadership and State of Statistical System reports.

But what do we mean when we talk about transparency? Why do we care? And what can be done to support it?

What do we mean by transparency?

Transparency is about working in an open way. For us, transparency means being open about the data being used. Explaining what judgements have been made about data and methods, and why. Being clear about the strengths and limitations of data – including what they can tell us about the world, and what they can’t. It also means making sure data and associated explanations are easy to find and clearly presented. It is at the core of many of the practices outlined in the Code of Practice for Statistics.

Why does it matter?

The pandemic has increased the public appetite for data and drawn attention to the significance of data in decision making. Many of us will have become familiar with the phrase “data, not dates” – a phrase which UK government used as it set out its road map for easing coronavirus restrictions. In a context when so many have been asked to give up so much on the basis of data it is especially important that the data are understood and trusted. Transparency is essential to this.

Transparency supports informed decisions. Appropriate use of data is only possible when data and associated limitations are understood. We all make daily decisions based on our understanding of the world around us. Many of these are informed by data from governments, perhaps trying to understand the risk of visiting a relative or judging when to get fuel.

We also need this understanding to hold government to account. Clearly presented data on key issues can help experts and the public understand government actions. For example, whether the UK is taking appropriate action to tackle climate change? Or how effectively governments are managing supply chains?

Transparency gives us a shared understanding of evidence which supports decisions. It allows us to focus on addressing challenges and improving society, rather than argue about the provenance of data and what it means. It supports trust in governments and the decisions they make. It allows us to make better individual and collective decisions. Ultimately, it ensures that statistics can serve the public good.

What is government doing?

We have seen many impressive examples of governments across the UK publishing increasingly large volumes of near real time data in accessible ways. One of the most prominent being the coronavirus dashboard and equivalents in other parts of the UK, such as the Northern Ireland COVID-19 Dashboard.

It has become routine for data to be published alongside daily Downing Street briefings, and through its additional data and information workbook Scottish Government has put in place an approach which enables it to release data quickly when necessary. We have also seen examples of clear explanations of data and the implications of different choices, such as the Chief Statistician’s update on the share of people vaccinated in Wales.

However, this good practice is not universal. Transparency regularly features in our casework. We have written public letters on a range of topics including Levelling Up, fuel stocks, hospital admissions and travel lists. We want to see a universal commitment to transparency from all governments in the UK. This should apply to data quoted publicly or used to justify important government decisions. Where data are not already published, mechanisms need to be in place to make sure data can be published quickly.

The Ministerial Code supports this ambition by requiring UK Government ministers to be mindful of the Code of Practice for Statistics – a requirement that is also reflected in the Scottish and Welsh Ministerial Codes and the Northern Ireland Guidance for Ministers. In response to a recent Public Administration and Constitutional Affairs Committee report the UK Government itself said:

“The Government is committed to transparency and will endeavour to publish all statistics and underlying data when referenced publicly, in line with the Code of Practice for Official Statistics.”

What is OSR doing?

We want to see statistics serve the public good, with transparency supporting informed decisions and enabling people to hold government to account. Over coming months, we will:

  • build our evidence base, highlighting good examples and understanding more about barriers to transparency.
  • continue to intervene on specific cases where we deem it necessary, guided by the UK Statistics Authority’s interventions policy.
  • work with external organisations and officials in governments to support solutions and make the case for transparency.

What can you do?

We’re under no illusion: OSR can’t resolve this on our own. Whether an organisation or individual we need your help.

You can question the data you see. Does it make sense? Do you know where it comes from? Is it being used appropriately?

You can raise concerns with us via regulation@statistics.gov.uk – our FAQs set out what to expect if you raise a concern with us. We’d also love to hear from other organisations with an interest in transparency.

And you can keep up to date with our work via our newsletter.

 

 

Reflections on lessons learned from COVID for health and social care data

You may have noticed that the last 18 months or so have been rather unusual. In fact it’s getting difficult to think about what things were like before masks, distancing and the universal smell of alcohol gel.  And there’s another change to which we have become accustomed – the daily parade of statistics, the use of graphs on the news, and the huge presence of scientific and statistical discussion, both in the media and among ordinary people who are not even statisticians!

The scale and ambition of the health data being made available would have been unthinkable just two years ago, as would be the complexity and sophistication of the analyses being conducted. But the Office for Statistics Regulation’s ‘Lessons Learned’ report argues that we should not be complacent: we need to press harder for more trustworthy, better quality, and higher value statistics.

There are a few recommendations that stand out for me. First, Lesson 9 focusses on improved communication. Back in May 2020 I stuck my neck out on the Andrew Marr show and criticised the press briefings as being a form of ‘number theatre’, with lots of big and apparently impressive numbers being thrown around without regard for either accuracy or context. This attracted attention (and 1.7m views on Twitter). But although some dodgy graphs continued to appear, the presentation of statistics improved.  Crucial to communication, however, is Lesson 1 on transparency – it is essential that the statistics underlying policy decisions, which affect us all, are available for scrutiny and are not cherrypicked to avoid those that might rock some political boat. This requires both constant vigilance, and appropriate clout for professional analysts.

Lesson 7 deals with collaboration, reflecting the extraordinary progress that has been made both in collaboration across governments and with academic partners, all of whom have shown themselves (against archetype) to be capable of agile and bold innovations. The Covid Infection Survey, in particular, has demonstrated both the need and the power of sophisticated statistical modelling applied to survey data. Although of course I would say that, wouldn’t I, as I happen to be chair of their advisory board, which has enabled me to see first-hand what a proper engagement between the ONS and universities can achieve.

Finally, Lesson 3 addresses the idea that data about policy interventions should not just enable us to know what is happening – essentially ‘process’ measures of activity – but help us to evaluate the impact of that policy. This is challenging; Test and Trace has come in for particular criticism in this regard. For statisticians, it is natural to think that data can help us assess the effect of actions, with the randomised clinical trial as a ‘gold-standard’, but with an increasing range of other techniques available for non-experimental data. Again there is a need to get this up the agenda by empowering professionals.

An over-arching theme is the need for the whole statistical system to be truly independent of political influence from any direction. While this is enshrined in legislation, a continued effort will need to be made to make sure that work with data lives up to the standards expressed in the Code of Practice for Statistics, in terms of trustworthiness, quality and value. The pandemic has shown how much can be achieved with the right will and appropriate resources, and OSR’s ‘Lessons Learned’ point the way forward.

 

David Spiegelhalter is a Non-Executive Director of the UK Statistics Authority, which oversees the work of the Office for Statistical Regulation.

Launching our consultation: We want your views

As a regulator, we want to be as helpful as we can to producers of statistics to enable the release of valuable information, while also setting clear standards of practice. In the pandemic we supported producers by granting exemptions to the Code of Practice for Statistics to enable some statistics to be released at times other than the standard time of 9.30am.

Market sensitive statistics could no longer be released after the usual lock-in briefings, so we agreed for them to be released at 7am. This has meant that the lead statisticians have been able to speak in the media and explain the latest results.

We also enabled some statistics related to COVID-19 to be released later in the day as soon as they could be shared publicly with sufficient quality assurance. It has meant for example that both the Coronavirus infection survey bulletins and the Public Health Scotland COVID-19 weekly report for Scotland are released at noon.

Having a specific time of release has helped ensure consistency in release and grow confidence that official statistics are truly independent of political preferences and interference. The pandemic has brought to light how important timely statistics are, and the huge demand has meant that release timings have had to change so that the statistics remain relevant and useful to the public.

As we look beyond the pandemic, we have been considering whether we should amend the Code of Practice to enable more flexibility for producers but at the same time keep the benefits of consistency and protection against interference. We are grateful to everyone who has shared their views with us in our discovery phase. It has helped us consider a range of issues.

Consultation

We are pleased to announce that a 12-week consultation will begin on 28 September 2021, ending on 21 December 2021. Our consultation paper will set out some proposals on release approaches that look to maintain the benefits of standard release times but also support some greater flexibility. The Authority will carefully consider the responses before deciding on its preferred release practice.

We encourage you to consider the suggestions and to share your views with us.

A closer look at loneliness statistics

At OSR, we have always been aware of the importance of loneliness statistics on a national and local scale. In 2019, we started a systemic review of loneliness statistics to investigate the state of official statistics on loneliness in the UK. 

Initially, we found there were some significant gaps in loneliness data that were not being filled by official or national statistics. Statistics users we spoke to, such as charities focused on loneliness, told us this made it more difficult for them to carry out their core functions of preventing and tackling loneliness among the UK population.  

We heard that good quality statistics that covered local and regional geographies were needed in order for them to deliver their services, allocate funding, and in some cases, present evidence to their regional parliaments. Where official statistics were not meeting these needs, expert users were often stepping in and producing their own statistics to fill data gaps. Given this, we identified a range of specific recommendations to help improve official statistics on loneliness. 

Like many pieces of work during this period however, the pandemic made us re-think our approach. The pandemic has changed how we all think and act, including how we think about loneliness. Understanding and addressing loneliness among the population has become a focus for governments and policy makers. In response, statistics producers have had to develop their loneliness statistics to meet society’s need for information. As a result, many good developments have happened in this area and we’ve found that statistics producers have been filling in some of the key gaps we identified when we first started looking at these statistics. Our new report published today commends the efforts by statistics producers in creating statistics that better serve the public good in answer to these societal changes. 

This isn’t to say that improvements can’t still be made though. Users we spoke during the pandemic still identified some key gaps in the official statistics landscape on loneliness. We would encourage statistics producers to build on the work they had achieved in the last 18 months and to continue to take forward producing statistics that meet user needs and offer value for charities and academics in preventing and researching loneliness.  

Continuing the loneliness review was one of the first pieces of work I got when I started my placement year at the OSR last September. I’ve really enjoyed working on the report and having the opportunity to lead a review and conversations with producers. Seeing the report published on my last day at the OSR brings a wonderful and rather cyclical end to my year! The work isn’t ending with me though. As an organisation, we are looking forward to continuing working in this area and assisting producers to develop their statistics to better meet user needs. If you would like to contact us about this, please email my colleague, Emma Harrison. 

“Welp. We screwed up”: A lesson in admitting your data mistakes

A couple of months ago this tweet from the Governor of Utah caught my eye:  

The background was that the Utah Department of Health had identified an error in its published vaccinations figures – fewer people had been vaccinated against coronavirus than had been reported. In a public letter to his fellow Utahns, Governor Cox admitted the mistake.  

Here at the Office for Statistics Regulation we love this kind of openness and action. Five things stood out to us from Governor Cox’s letter, which we think all statistics producers in the UK should be thinking about when it comes to data errors. 

  1. Identify your mistake

You can’t fix what you don’t know is broken. This is why rigorous quality assurance is so important. Statisticians need to regularly look under the hood of their analysis to assure themselves and others that it is correct. In Utah, a healthy dose of scepticism about an unexpectedly high vaccination rate made the data team double-, triple- and quadruple-check their figures, until they found the mistake. So, as a statistics producer ask yourself: how do I assure myself that my analysis is correct? Do the numbers look as I expect, and why or why not? What are the risk points for errors? If the root cause of an error isn’t obvious then it can help to ask the five whys until you reach it. 

  1. Be open

One of the things that impressed us most about this example was how direct and open the communication of the error was. There was clear ownership of the mistake and a willingness to correct the figures quickly, publicly and with humility. In the UK, our Code of Practice for Statistics requires government statistics producers to handle corrections transparently. It’s also important that government officials and minsters who use statistics are open about mistakes. 

  1. Fix it

Of course, once you have identified the mistake, it needs to be fixed. As well as being transparent about corrections, the Code of Practice asks that they are made as quickly as is practical. 

  1.  Explain

In Utah, Governor Cox explained that while they had reported that 70% of adults had received at least one dose of a coronavirus vaccine, the actual figure was 67%. In a separate statement, the Utah Department of Health went into more detail about the numbers and the mistake. Statistics and data should help people understand the world around them. So, when admitting a data error, it’s important to clearly explain the impact of it – what has changed and what does this mean? 

  1. Improve

The last, but perhaps the most important, step is to learn from the mistake – so that you can avoid it, or something similar, happening again. In Utah, the data team re-examined their processes to prevent this general type of error from being repeated. Statistics producers should reflect on why a mistake was made and what can be done to avoid it in future – and then share what they have learned, and what action they are taking, with their users. 

Statistics and data produced by governments serve a vital purpose. They keep us informed, help us make decisions and allow us to hold our governments to account – so we must have confidence in them and the people who produce them. As Governor Cox said, “trust consists of two things: competence and ethical behaviours”. We agree. The Code of Practice places a strong emphasis on trustworthiness. We see that trustworthiness is demonstrated by organisations which are open, and where people who work on statistics are truthful, impartial, and appropriately skilled. We are all human, we mess up and we make mistakes – but we can retain trust by actively looking for our mistakes, being open when we find them and by learning from them.