The Code Pillars: Quality

When I joined OSR as a placement student last September, the Code of Practice for Statistics was unknown territory. It certainly sounded official and important. Was it password protected? Would I need to decipher something or solve a puzzle to get in?

It soon became clear to me that this elusive ‘Code’ was at the heart of everything I would be doing at OSR. Not wanting to remain in the dark any longer, I dutifully dragged it to the front of my bookmarks bar and began to familiarise myself with its contents. (Thankfully no complicated code-cracking required).

The Trustworthiness and Value pillars appeared to be pretty straightforward. Yet, something about the Quality pillar didn’t seem quite so inviting. It sounded like the technical, ‘stats-y stuff’ pillar, that my degree background in economics and politics would surely leave me ill-equipped to understand.

*Spoiler alert* I was wrong.

It turns out that ensuring statistics are the highest quality they can be, isn’t as complicated and technical as I once feared. Quality simply means that statistics do what they set out to do and, crucially, that the best possible methods and sources are used to achieve that.

There are lots of ways that statistics producers can meet these aims. For example, quality can be achieved through collaboration. This can be with statistical experts and other producers, to arrive at the best methods for producing data. It can also be with the individuals and organisations involved in the various different stages of the production process – from collecting, to recording, supplying, linking, and publishing. Collaborating in these ways not only helps to ensure that statistics are accurate and reliable, but also that they are consistent over time and comparable across countries too.

There are lots of other important-sounding documents like our Code of Practice that set out national or international best practise and recognised standards and definitions for producing statistics and data such as the GSS harmonisation standards and the Quality Assurance Framework for the European Statistics System. These also help producers ensure that their statistics and data meet the highest possible standards of quality.

Quality is not only important at the producer-end of the equation, but at the user-end too. It is vital that producers are transparent with their users about how they are ensuring the quality of their statistics. This means telling users about the steps they take to achieve this, and being clear with them about the strengths and limitations of the statistics with respect to the various different ways in which they could be used.

For an indication of just how important quality is, the Quality Review of HMRC Statistics we conducted last year is a prime example. After identifying an error in its published Corporation Tax receipt statistics, HMRC asked us to assess its approach to managing quality and risk in the production of its official statistics. With the Code as our guide, we were able to review HMRC’s existing processes and identify potential improvements that could be made to reduce the risk of statistical errors in the future.

This is just one example of how high-quality data fulfils our vision of statistics that serve the public good. We have found many others across our work and we continue to support producers to consider quality when producing statistics. Last year, we published new guidance for producers on thinking about quality, which was inspired by the HMRC review and the questions we asked.

If you’re interested in finding out more about Quality and the other pillars of our Code, check out the Code of Practice website. I promise it’s not as scary or elusive as it sounds…

 

Did you know we have case studies on our Code website too? Here are some of our examples that highlight examples of good practice in applying the quality pillar of the Code.

  • Q1 – Ensuring source data is appropriate for intended uses
  • Q2 – Developing harmonised national indicators of loneliness
  • Q3 – Improving quality assurance and its communication to aid user interpretation

Statistics shining a light

To help us celebrate World Statistics Day on Tuesday 20 October, John Pullinger, President of the International Association for Official Statistics, has written this guest blog.

The last few months have been a dark time. There has been tragedy that has touched our families and communities. We struggle to see what is really going on as we try to make sense of the unfamiliar landscape that surrounds us. As the new reality begins to dawn, we need to get a clear picture so we can take the right steps to build a better future.

High quality, trustworthy and valued statistics help everyone see things for what they are. They shed light where there is gloom. My inspiration is Florence Nightingale whose 200th birthday we have celebrated this year. Famously she was the lady with the lamp, tending the sick and wounded in a war zone. Her example has drawn many into the wonderful vocation that is nursing.

When she came home she worked to pull together the data that had been collected during the war. She produced data visualisations that are stunning in their beauty and devastating in their message. They show with outstanding clarity that the main thing that killed the soldiers was the conditions in the hospitals not the wounds inflicted by the enemy.

As a nurse she cared about people but the light from her lamp reached only those in the room. As a statistician she cared about people and found a voice that demanded attention in the corridors of power. The insight from her statistics was a beacon that reached across the world. Her example calls statisticians to cherish their vocation as it does for nurses.

Today, as we think about the world after COVID, we are rightly showing much more love to our nurses. As in Florence Nightingale’s day the significance of statistics too is in the spotlight. Good statistics save lives. They enable our governments to take decisive and proportionate action and aid the creation of jobs and prosperity for all. They help identify injustice and enable the powerless to hold the powerful to account. They give us all a special way to assess the state of the world in which we live so that we can act to create an environment we want to live in and a sustainable future for our children. They are a guiding light for better decisions and better lives.

However, numbers used in public are not mere facts. The person using them is doing so to influence their audience. We need to know that the advertiser isn’t telling us about bias in the sample that generated 8 out of 10 likes for their product. We need to know that the politician isn’t telling us that being £1000 better off if you vote for them is based on lots of conditions that cannot be guaranteed. We need to know that the headline screaming “killer food increases risk of death by 20%” relates to a condition we are highly unlikely to get, so the extra risk is negligible. There is an unprecedented amount of data, a proliferation of channels to propagate it and often weak incentives to ensure that the information we receive is what it purports to be.

Fact checking and regulation of statistics used in public life help us see into the darkest corners of falsification, manipulation and mind-games with numbers. The work of the Office for Statistics Regulation is a vital public service, working alongside Full Fact and other organisations. Statistics produced for the public good are released into a foggy and polluted atmosphere full of dodgy data within a climate that seems to get ever hotter. A climate where many people all too readily cloak their vested interests in a fake veneer of statistical claims.

Now is a time to support those in the statistical community who provide statistics for the public good and to stand up against those who misuse statistics for their own vested interest. If we are to navigate a way forward to a better future we need high quality, trustworthy and valued statistics (and respect for our nurses).

Thinking about quality when producing statistics

Quality means doing it right when no one is looking.” – Henry Ford

 

Official statistics inform government, the media and the public about the issues that matter most in society. To feel confident using official statistics, people must trust them: quality has an important part to play in earning this trust.

In April, we published a review of the quality of HMRC’s official statistics. HMRC invited us to carry out this review after identifying a significant error in one of its published National Statistics. The review provided an independent assessment of HMRC’s quality management approach and identified improvements to strengthen the quality of their official statistics.

We made nine recommendations, which HMRC has welcomed. Many of the recommendations will apply to other producers – not just to strengthen the quality of official statistics, but also to improve the quality of all analytical outputs.

This blog tells the story of the review and its findings, from the perspectives of HMRC and OSR. We hope to inspire other producers to think about how they can build on their own approach to quality, to ensure statistics meet the needs of the people who use them.

Jackie Orme, Programme Lead, HMRC

In 2019 HMRC identified an error in published corporation tax receipt statistics, which led to us having to make substantial revisions. This was a serious concern both internally for HMRC and for external users of HMRC statistics. In response we undertook a number of actions, including initiating an internal audit review and inviting OSR to review the principles and processes underpinning production of our official statistics.

The review by OSR was particularly important to us as statisticians and analysts in HMRC, to draw on expert and independent advice in improving our ways of working. While some of the findings could potentially be uncomfortable, the review would support our desire to take a broad and ambitious approach to improvement and the weight of OSR’s views and advice would give credence to the need for change.

The review was carried out efficiently and we were kept well-informed about progress. The OSR review team devoted lots of time to talking to staff and stakeholders to get their input and views, across all grades and professions. This level of involvement has been helpful to us subsequently in securing initial engagement and agreement to changes across the organisation. For example, in getting active support from senior HMRC leaders to implement recommendations, such as creating a new cross-cutting team as part of our analysis function to build on our existing approach to data quality and assurance.

The review has given us the opportunity to reflect on data quality issues and the importance of having robust data to produce high quality statistics and analysis. We have built a substantial programme of work to implement the recommendations and are starting to recruit people to the new team. Some recommendations will be straightforward to implement. For example, we have already started to review our statistics outputs, in order to make sure analytical resource is being used effectively.

In contrast, other recommendations are more challenging to implement, in particular, mapping the journeys of our data within the department. This will take significant combined effort by analysts, data providers and data processors.

As highlighted in the report, HMRC has some older systems for processing and storing its administrative data and the review has been helpful in emphasising how essential it is for analysts to be involved in discussions and decisions around the design of future systems. These sorts of insights from the report have helped us build a case for increased resource and forge stronger links with data providers, to work together to improve the quality of HMRC’s statistics and analysis.

Helen Miller-Bakewell, Project Manager, OSR

We were really pleased when HMRC asked us to do this review: in doing so, it showed a proactive and open approach to strengthening the quality of its official statistics.

It’s the first time we’ve done a piece of work that looks across all of a producer’s official statistics at once – although we have now done something similar with the Defra Group (The Department for the Environment and Rural Affairs and its agencies and public bodies), with a focus on user engagement. Normally, we look at one set of statistics in detail, or we review how statistics on a topic area come together to meet user needs. This was somewhere in the middle!

To inform the review, we spoke with a wide range of people involved in the production of official statistics in HMRC; analysts working on the statistics directly, managers who oversee them and a handful of people indirectly involved in the production process, who own and supply data.

The OSR team spent about an hour with each individual or team we interviewed, during which we asked lots of questions about the production process. This helped us to understand how the quality of statistical outputs was managed in HMRC, and the challenges analysts can face.

It turned out to be a useful process for the producer teams as well, and we were asked for our question list a couple of times, to help them think about the quality of their statistics in the future. We’ve now packaged up this question list in a published guidance document, so that all producers can benefit from it.

The findings of the review highlight the issues that big operational departments working with administrative data can face with respect to quality and will ring true for other Government departments. The recommendations stress the importance of analysts fully understanding the nature and quality of data they are working with, and of building effective working relationships with data providers or managers to facilitate this.

In addition, OSR champions a broad approach to quality assurance of data and statistics, and regular reviews of publications to ensure analytical resource is being used effectively. The report emphasises the importance of having analytical leaders that champion and support changes and innovations that can enhance quality, while recognising that analysts do not operate in isolation and that long-term improvements to quality management rely on understanding, values and responsibility being shared across organisations.

We’re pleased the review has been so helpful to HMRC. We would like to thank everyone who gave their time to speak with us during the review. Their cooperation and openness were key to us arriving at findings that resonate with analysts working in HMRC and recommendations that will have a lasting positive impact on the quality of HMRC statistics.

An analyst’s job is never done

‘Don’t trust the data. If you’ve found something interesting, something has probably gone wrong!’ Maybe you’ve been there too? It was a key lesson I learnt as a junior researcher. It partly reflected my skills as an analyst at the time – the mistakes could well have been mine! But, not entirely.

You see I was working with cancer registration and deaths data which on occasion could show odd patterns due to changes in disease classifications, diagnosis developments or reporting practices. Take a close look and you could spot the step changes when a classification change occurred. Harder to spot might be the impact of a new treatment or screening programme. But sometimes there were errors too – including the very human error of using the wrong population base for rates.

I was reminded of this experience when Sir Ian Diamond, the National Statistician, spoke to the Health and Social Care Select Committee in May. He said (Q34):

“One of the things about good statisticians is that they are always just a little sceptical of the data. I was privileged to teach many great people in my life as an academic and I always said, “Do not trust the data. Look for errors.””

Sage advice from an advisor to SAGE!

The thing with quality is that the analyst’s job is never done. It is a moving target. In our Quality Assurance of Administrative Data guidance, we emphasise the importance of understanding where the data come from, how and why they were collected. But this information isn’t static – systems and policies may alter. And data sources will change as a result.

Being alert for this variation is an ongoing, everyday task. It includes building relationships with others in the data journey, to share insight and understanding about the data and to keep a current view about the data source. As Sir Ian went on to point out in his evidence, it should involve triangulating against other sources of data.

OSR recently completed a review of quality assurance in HMRC, at the agency’s invitation. It was a fascinating insight into the operation of the organisation and the challenges it faces. We used a range of questions to help inform our understanding through meetings with analytical teams. They told us that they found the questions helpful and asked if we would share them to help with their own quality assurance. So, we produced an annex in the report with those questions.

And we have now reproduced the questions in a guide, as prompts to help all statistics producers think about their data and about quality under these headings:

  • Understanding the production process
  • Tools used during the production process
  • Receiving and understanding input data
  • Quality assurance
  • Version control and documentation
  • Issues with the statistics

The guide also signposts to a wealth of excellent guidance on quality on the GSS website. The GSS Best Practice and Impact Division (BPI) supports everyone in the Government Statistical Service in meeting the quality requirements of the Code and improving government statistics. BPI provides a range of helpful guidance and training.

  • Quality Statistics in Government guidance is primarily intended for producers of statistics who need to ensure that their products meet expectations for statistical quality. It is an introduction to quality and brings together the principles of statistical quality with practical advice in one place. You will find helpful information about quality assurance of methods and data and how to design processes that are efficient, transparent and reduce the risk of mistakes. Reproducible Analytical Pipelines (RAP) and the benefits of making our analysis reproducible is also discussed. The guidance complements the Quality Statistics in Government training offered by the GSS Quality Centre.
  • Communicating quality, uncertainty and change guidance is intended for producers of official statistics who need to write about and communicate effectively information about quality, uncertainty and change. It can be applied to all sources of statistics, including surveys, censuses, administrative and commercial data, as well as estimates derived from a combination of these. There is also a Communicating quality, uncertainty and change training.
  • The GSS Quality Centre has developed a guidance which includes top tips to improve the QA of ad-hoc analysis across the GSS. Moreover, the team runs the Quality Assurance of Administrative Data (QAAD) workshop in which users can get an overview of the QAAD toolkit and how to apply it to administrative sources.
  • There is also a GSS Quality strategy in place which aims to improve statistical quality across the Government Statistical Service (GSS) to produce statistics that serve the public good.

Check out our quality question guide and let us know how you get on by emailing me at penny.babb@statistics.gov.uk – we would welcome hearing about your experiences. We are always on the look-out for some good examples of practice that we can feature on the Online Code.

Piecing things together

People who provide services often need to know about local variations so that they can focus efforts in the right places. We are all witnessing this first hand at the moment in how the country is responding to COVID-19, for example, with a need for detailed geographical data to help NHS planning.

The Race Disparity Unit (RDU) is a team within the Cabinet Office. It is primarily a data and statistical unit which collates, publishes and analyses UK ethnicity data, works across Government on issues where ethnicity is an important factor, and engages with external stakeholders to understand different perspectives.

When RDU talks to users of its Ethnicity facts and figures website, they tend to say two things. First, it’s a great resource. It includes a wide range of data on different topics for different ethnic groups. And it presents the data in an accessible way. This makes us feel very happy.

But they also ask for data at the local authority (LA) level. Users find that regional or national figures mask local variations. They need to know about these variations so that they can deliver the right services – which makes perfect sense across the piece: detailed geographical data about where those aged over 70 live at a local level to help provide support during COVID-19  is just one example, albeit extreme and traumatic, of this wider pattern.

This need is also true for small area ethnicity data. And the user demand for small area ethnicity data makes us feel a bit anxious, because our website doesn’t have much data for individual LAs. It does include a dashboard which shows the data we have for different geographies. But this doesn’t address the user need.

So RDU has linked together the datasets we have that include local authority data. This includes data on school performance, employment rates, and so on. It also includes data about local circumstances – for example, how deprived the area is. So far, we’ve made great progress with the prototype. But getting a range of datasets to talk to one another can be difficult. Many of them don’t follow statistical geography standards/best practice. We’ve talked about the various hurdles faced in a previous blog.

Our work on geography has made us think about how we can improve the value of the data on the website. “Value” is one of the three pillars of good statistical practice promoted by the Office for Statistics Regulation (OSR). It is hard-wired into its Code of Practice for Statistics (along with trustworthiness and quality).

First, context is everything. Statistics need to be relevant and reflect the lived experience to be most useful to a wider audience. The power of statistics is in providing insight through the aggregation of many individual data points to form a big picture. Context provides the colour for what would otherwise be a grey-scale image.

Many official statistics are not presented at ‘local’ levels. There can be good reasons for this but without this information insight is narrowed. The Code of Practice encourages statisticians to provide data at the greatest level of detail that is practicable. Anything produced at the national level is usually required at the local level. And so, it’s worth all producers thinking about what information their users need and what the data tells them.

Be curious – see what patterns are in the data, by place.

Second, the little things matter. Putting the dot in St. Albans, or not, matters. A single full stop can be the difference between two datasets automatically linking together and the need for a manual correction. And while that single full stop will never be complex to resolve, it is rarely just a single full stop. Instead it is a series of manual corrections that are a barrier to the insight gained by linking data. Metadata on the year of the geographic classification used is also valuable to those of us wanting to join datasets. Local government structural boundaries can change every few years. We would rather know in advance that some of the records won’t match, than have to play trouble-shooter later.

While ‘place’ is flexible in its degree of specificity, it is best standardised. We can link key geographic information if variables are coded consistently. Bespoke coding frames get in the way of data linkage and reduce the value of the data.

Be consistent – enable the greater value of your data to be achieved by using harmonised codes.

Third, innovation is vital. Arguably the geography prototype is ‘only’ an Excel spreadsheet. What is innovative about it is the way that it draws data together. Over time this will support a mapping function. This will help bring the data to life and to allow users to overlay different data sets at the LA level. We are already using the dataset to identify areas of policy interest and to target our engagement.

Another potential innovation – at present no more than a twinkle in RDU’s eye – is an Index of Multiple Ethnic Disparity (IMED). The IMED is analogous to MHCLG’s Index of Multiple Deprivation. It would allow users to identify those parts of the country where ethnic disparities are most pronounced, across a range of topics. If we were able to add in historical data, we would be able to look at the interplay between geography and time. There are some presentational, methodological and conceptual challenges in producing such an Index. RDU will begin to address these as we think about the use we want to make of data from the 2021 Census (see below).

Fourth, we can add value by working together, sharing perspectives and expertise. The RDU is keen to work with local authorities on the ‘geography prototype’. We are already working with Bristol City Council, which is using data to address ethnic disparities.

OSR have said that they will review the use of harmonised geographic codes and standards as part of their regulatory work. They will also provide guidance on meeting the standards of the Code of Practice. ONS’s Open Geography Portal makes it easier for data owners to use the correct classifications. Various groups can help unlock the potential of ethnicity data. These include:

All it requires is shared commitment!

The ONS has a team that supports everyone in the GSS to improve official statistics. This is the Best Practice and Impact (BPI) division. BPI encourages everyone in the GSS to share best practice. One of the ways we do this is by running champion networks, this includes a geography network. If you would like to represent your department or share a piece of work you have done please get in touch.

Fifth, more (and better) data will allow us to deliver much more value. RDU is starting to consider how to use data from the 2021 Census of Population. It will enable us to paint a far richer picture about the different ethnic groups than we can by using surveys or administrative sources. We are exploring the scope to link datasets to provide more geographical insights. And we are continuing to work with the ONS to improve the way that ethnicity is classified across government. Our goal is that in future users can compare data from different data sources directly.

This is a guest blog from Richard Laux (Cabinet Office) and Claire Pini (GSS Harmonisation Team in ONS).

The new Code of Practice is coming…

The new Code of Practice will be published next Thursday.

We’re really grateful for the huge amount of thought and effort you’ve contributed throughout our consultation process. It’s been amazing to see how much interest and enthusiasm the Code has generated.

I won’t give away too much, but a few things to look out for:

  • the Code will be based around the three pillars of trustworthiness, quality and value
  • there will be new interactive pages on this website, with links to guidance
  • we will also consult on our draft guide for voluntary application of the Code beyond official statistics, both inside and outside Government.

Our key message throughout this is that statistics are the lifeblood of democracy. The Code is built around public confidence in this essential public asset.

The Code will be used by statisticians and analysts on a daily basis. But it’s got a much wider reach. It helps Government organisations demonstrate that they live up to the highest standards. And it helps citizens have confidence in the statistics that describe the community and wider society they live in.

So we’re pretty excited to share this Code with you next week.

Data, quality and the Code

Having spent much of the past two years talking about how to approach quality assuring administrative data, my thinking about the Quality pillar of the refreshed Code was firmly grounded on our Quality Assurance of Administrative Data (QAAD) framework (check it out here for pointers and case examples).

But the Quality pillar is more than that, and respondents to our consultation rightly pointed out that our draft Code had not gone far enough to address the practices to other data types. So we have revisited these principles, to make them more widely applicable and also to be simpler.

The Code and Quality following the Consultation

The structure of the Quality pillar is essentially the same as in the draft Code, based around a basic statistics process model: Suitable Data Sources; Sound Methods, and Assured Data Quality. But we are thinking carefully about how the principle of coherence fits into that model. We absolutely agree with the importance of the practices covering coherence, consistency and comparability – we tend to think, though, that they will be clearer when integrated into the relevant principles.

So, for example, the practice about internally coherent and consistent data fits with the principle on Suitable Data Sources.  And a practice around using harmonised standards, classifications and definitions fits with the Sound Methods principle.

In fact, we are considering different aspects of coherence across the three pillars:

  • We are adding an emphasis on promoting coherence and harmonisation into the Heads’ of Profession for Statistics role in Trustworthiness
  • In Value, the Insightful principle promotes explaining consistency and comparability with other related statistics
  • We are emphasising the use of consistent and harmonised standards when collecting data in the Efficient Data Collection and Use principle in Value, as these support data integration and the more efficient use of data

The European Statistical System Five Dimensions of Quality

Another area that received a lot of comment in the consultation regarded our definition of quality when compared with the Quality Assurance Framework of the European Statistical System (QAF).

QAF presents five quality dimensions: relevance, accuracy and reliability, timeliness and punctuality, coherence and comparability, and, accessibility and clarity.

We completely agree with the importance of these dimensions but our structure of Trustworthiness, Quality and Value frames them in a way that helps relate the practice to the outcome we are seeking:

  • We see ‘relevance’ and ‘accessibility and clarity’ as central to our Value pillar. They are critical aspects of providing information to support decision making
  • We see ‘timeliness and punctuality’ and ‘coherence and comparability’ as cross-cutting each of the pillars – they speak to organisational processes and policies, meeting the needs of users for timely and comparable information, as well as relating directly to the nature of the quality of the statistics
  • We see accuracy and reliability as central to our Quality pillar; they inform each of the principles. We have revised the principle ‘Assured Data Quality’ to reflect the need for quality indicators to cover the areas of timeliness and coherence, as well as accuracy.
  • Producers should also monitor user satisfaction with each of the five quality dimensions under Value principle, Reflecting the Range of Users and Uses.

By regularly monitoring the quality indicators for the five quality dimensions and reporting them transparently, statistics producers can reassure users of the suitability of the statistics to meet their intended uses.