Smart statistics: what can the Code tell us about working efficiently

Helen Miller-Bakewell, Head of Development and Impact at OSR, explores ways the Code of Practice for Statistics can assist producers of statistics and analysis increase their efficiency while facing pressure on resources

Most of us working in government (or indeed beyond government) will be familiar with the feeling that we’d like more resource, be it time, money, people, or all three! 

In July, we published our view on the current state of the UK’s statistical system. This year’s report highlights the tremendous amount of insightful and influential statistics produced by government analysts, but also explores the challenges producers of statistics are increasingly facing from pressure on resources.  

Addressing pressure on resources is a complex problem, which will require a multifaceted solution. But it raises a very closely related question: how can producers of statistics and analysis work in an efficient way, to ensure they achieve the maximum value from the resources they do have? It is this question we will consider here. 

As ever, when faced with a question about the production of statistics or analysis, we first look to our Code of Practice for Statistics. Three principles within the Code speak immediately to this question: relevance to users; innovation and improvement; and efficiency and proportionality. Below, we outline ways to support efficiency in these areas; however, we are very conscious that, when facing pressure on resources, it can be hard to initiate and implement them. To this end, we hope the case studies and links to available support and guidance can help. Please get in touch if you would like further advice or support. 

Relevance to users (V1):

Users of statistics and data should always be at the centre of statistical production; their needs should be understood, their views sought and acted on, and their use of statistics supported. We encourage producers of statistics to have conversations with a wide range of users to identify where statistics can be ceased, or reduced in frequency or detail, to save resources if appropriate. This can free up resource, while helping producers to fulfil their commitment to producing statistics of public value that meet user needs. Ofsted has recently done this to great effect. 

While effective user engagement itself takes time and expertise, this investment is key to ensuring resources are well-spent elsewhere. Undertaking public engagement collaboratively wherever possible, including working in partnership with policy makers and other statistics producers can reduce the resource required. The Analysis Function User engagement strategy for statistics has a strong focus on collaboration and how this will be supported across the statistical system in the future, including through the User Support and Engagement Resource (USER) Hub and theme-based user groups and forums. 

Innovation and improvement (V4):

The UK statistical system should maintain the brilliant responsive and proactive approach we have seen in the last few years and look to do this in a sustainable way. Improvements to data infrastructure, processes, and systems could all help. For example, the use of technology and data science principles, such as that set out in our 2021 Reproducible Analytical Pipeline (RAP) review, supports the more efficient and sustainable delivery of statistics. This review includes several case studies of producers using RAP principles to reduce manual effort and save time, alongside other benefits. The recent Analysis Function Reproducible Analytical Pipelines (RAP) strategy sets out the ambition to embed RAP across government, and the Analysis Function can offer RAP support, through its online pages, its Analysis Standards and Pipelines Team and via the cross-government the RAP champion network. 

Efficiency and proportionality (V5):

Statistics and data should be published in forms that enable their reuse, and opportunities for data sharing, data linkage, cross-analysis of sources, and the reuse of data should be acted on. The visualisations and insights generated by individuals, from outside the statistical system, using easily downloadable data from the COVID-19 dashboard nicely demonstrate the benefits of making data available for others to do their own analysis, which can add value without additional resource from producers. Promoting data sharing and linkage, in a secure way, is one of OSR’s priorities and we are currently engaging with key stakeholders involved in data to gather examples of good practice, and to better understand the current barriers to sharing and linking. This will be used to champion successes, support positive change, and provide opportunities for learning to be shared.  

When we reflect on these three principles, three further common principles and or themes become apparent to ensure their success: independent decision making and leadership, in particular Chief Statisticians and Heads of Profession for Statistics having authority to uphold and advocate the standards of the Code, professional capability – once more demonstrating the benefit of investing in training and skills, even when resources are scarce – and collaboration.  

All the principles listed above are supported by case studies in our online Code. These, along with case studies in our reports, can offer inspiration and practical suggestions to help analysts implement the ideas discussed. We are always delighted to discover new case studies that we can share to inspire others: if you can offer a case study, please do get in touch.  

Pressure on resources poses a significant threat to the ability of government analysts to produce the insight government and the wider population needs to make well-informed decisions. Working in an efficient way will help address one part of this problem: it will help ensure maximum value is achieved with the resources that are available, which will in turn help others across government appreciate the benefit of having analysts at the table.  

If you would like to discuss any of the themes raised here, or offer a case study that could help support smarter working among other producers of analysis, please contact us on 

How can official statistics better serve the public good?

How good are government statistics? In a recent seminar we asked members of the Government Statistical Service for three words they would use to describe Government Statistics. Among the top words we got back were ‘trustworthy’, ‘quality’ and ‘informative’. It was striking how closely these aligned to the three pillars of our Code of Practice for Statistics – Trustworthiness, Quality and Value – and encouraging to us, as the regulator of official statistics, to hear our message echoing with others 

Official statistics play a central role in answering society’s most important questions. The most salient questions currently facing society concern the COVID-19 pandemic, its impacts and societal responses to it. Data and analysis have been crucial in informing government and individual’s decisions and supporting public understanding.  

But the uses of official statistics extend far beyond the pandemic into peoples’ everyday lives: whether you are making decisions as a head teacher, or choosing your child’s school, developing policy on social housing, or trying to decide whether and where you should buy a house, have an interest in your local library remaining open, or are considering the country’s major economic decisions, you may well be using official statistics. This is why it’s so important that the UK’s statistical system responds to society’s information needs with insightful statistics.  

In a world of increasingly abundant data, expectations are higher. Individuals have become accustomed to information on many aspects of society in near real time with increasingly detailed breakdowns. Official statistics need to respond to these demands for information. Our work as a regulator of official statistics puts us in a unique position to reflect on the UK government statistical system and, in July, we set out our view on the current state of government statistics 

At their best, statistics and data produced by government are insightful, coherent, and timely. They are of high policy-relevance and public interest. During the COVID-19 pandemic, we’re seeing the kind of statistical system that we’ve always wanted to encourage – responsive, agile and focusing on users. However, the statistical system does not consistently perform at this level across all its work. In our report we address eight key areas where improvements could be made across the system. 

  1. Statistical leadership 
  2. Voluntary Application of the Code, beyond official statistics 
  3. Quality assurance of administrative data 
  4. Communicating uncertainty 
  5. Adopting new tools, methods and data sources 
  6. Telling fuller stories with data 
  7. Providing authoritative insight 
  8. User engagement 

In each area, as well as talking about what we would like to see, we highlight examples of statistical producers already doing things well, which others can learn from and build on.  

Our 5-year Strategic Business Plan sets out our vision and priorities for 2020-2025, and how we, as OSR, will contribute to fostering the Authority’s ambitions for the UK statistics system. In all our work, we will continue to champion the work producers do, celebrate the things they do well, and encourage them to continue to improve the statistics they produce so that, together, we can ensure that official statistics better serve the public good.  

Please get it touch if you’d like to discuss the report further. 

“Wouldn’t it be cool if…

…we could look at this against x! And y. And maybe a, b and c too…”

This felt like quite a common conversation with my team, back when I was analysing data in the Department for Digital, Culture, Media and Sport (DCMS) circa 2015.

The number of interesting questions and analyses we could do with our data, if we could only put it together with other data, felt potentially limitless. And what an amazing benefit these analyses could have to society – we’d basically be able to understand and improve everything!

But it wasn’t meant to be. We did try and match our survey data with data held by one other department and… it was painful! It took months to get to the point of being able to physically share and receive data and, once we had some data, getting it ready to analyse proved tricky too. In fact, it proved so difficult that, I’m ashamed to admit, I moved roles before I managed it.

OSR also continues to emphasise the power of linked data to produce better statistics. On paper, linking data sets might sound simple but, in practice, it is often difficult. This is why I’m so excited about the recent work we’ve seen from the Ministry of Justice (MoJ). MoJ is taking great steps to link up the administrative data sets it generates in its operational work, and to make them available for analysis by people outside of the department. This means that MoJ, and other interested parties, can more easily do analysis across different parts of the justice system, and beyond, to understand the journeys individuals take.

There are two projects I’d like to highlight:

         1. Data First

In collaboration with ADR UK (Administrative Data Research UK), MoJ is undertaking an ambitious data linkage project called ‘Data First’. OSR’s 2018 review of The Public Value of Justice Statistics highlighted the need for statistics that move from counting people as they interact with specific parts of the justice system to telling stories about the journeys people take. Data First is doing just that! It will anonymously link data from across the family, civil and criminal courts in England and Wales, enabling research on how the justice system is used and enhancing the evidence base to understand ‘what works’ to help tackle social and justice policy issues.

In June, we were delighted to hear that Data First reached its first major milestone. The first, research-ready dataset – a de-identified, case-level dataset on magistrates’ court use – was made available for accredited researchers through the Office for National Statistics (ONS) Secure Research Service (SRS). This data provides insight into the magistrates’ court user population, including the nature and extent of repeat users. It enables, for the first time, researchers to establish whether a defendant has entered the courts on more than one occasion and will drive better policy decisions to reduce frequent use of the courts. In August, a second output followed, this time a de-identified, research-ready dataset on Crown Court use. This dataset is also available through the SRS.

         2. Data shares with the Department for Education (DfE)

To improve understanding of the potential links between individual’s educational outcomes and characteristics and their involvement or risk of involvement with crime and the criminal justice system, MoJ and DfE have created a de-identified, individual-level dataset, which links data from the Police National Computer (MoJ) and the National Pupil Database (DfE)[1]. The DfE data spans educational attainment, absence from school, exclusions and characteristics like special educational needs and free school meals eligibility. The MoJ data includes information on criminal histories and reoffending, court proceedings, prison and assessments of offenders. Linking this data will allow analysis that has previously not been possible, including: longitudinal analysis of trends in individual’s characteristics and outcomes; analysis to inform the design of policies and processes that better support those at risk; and evaluations of the effectiveness of interventions. Accredited researchers can apply to access the data via the ONS SRS or MoJ’s Justice MicroData Lab.

This work follows The Children in Family Justice Data Share (CFJDS)[2], which started in 2012 and has resulted in a database of child-level data linked from across the MoJ, DfE and the Children and Family Court Advisory and Support Service (Cafcass). The CFJDS provides, for the first time, longitudinal data on the short and medium-term outcomes for children who experience the family justice system. The data are being used to build understanding of how different experiences and decisions made within the family court can impact on children’s educational outcomes, and subsequently, their life chances. In turn, they will provide more robust evidence on which to make policy decisions for children and their families.

What’s really exciting about both these projects is the way that the teams involved are tackling the challenges of data linkage. Instead of creating a big new IT system to try and join up the data, these projects are starting from a position of, “let’s take what’s in the current databases and see what we can get through anonymised matching.” The exact tools used vary between teams and departments but include established tools such as SAS Data Management Studio and SQL Server Management Studio (SSMS), which were used by MoJ and DfE respectively for linking crime and justice and NPD data. For data linkage done as part of Data First, MoJ have developed a new tool called Splink, which was written in the programming language Python. Splink is an open source library for probabilistic record linkage at scale: it’s free, and MoJ hope others in government (and beyond) will find it useful for their own data linkage and deduplication tasks. Rule based matching algorithms, including ‘fuzzy-matching’ algorithms – rules used to link data based on non-perfect matches between data variables – have been used to link individuals within and between data sets.

These projects show what can be achieved when government departments, agencies and external organisations work together, and will help us start to achieve what my team and I hoped we could back in 2015. They will enable us to better understand individuals and society and, in turn, to make better decisions and policies, which will improve the justice system and outcomes for all individuals. I’m looking forward to seeing what comes next.


[1] To ensure the confidentiality and protection of data about children, access to DfE data extracts from the NPD is managed through tightly controlled processes.

[2], published 29 March 2018

Thinking about quality when producing statistics

Quality means doing it right when no one is looking.” – Henry Ford


Official statistics inform government, the media and the public about the issues that matter most in society. To feel confident using official statistics, people must trust them: quality has an important part to play in earning this trust.

In April, we published a review of the quality of HMRC’s official statistics. HMRC invited us to carry out this review after identifying a significant error in one of its published National Statistics. The review provided an independent assessment of HMRC’s quality management approach and identified improvements to strengthen the quality of their official statistics.

We made nine recommendations, which HMRC has welcomed. Many of the recommendations will apply to other producers – not just to strengthen the quality of official statistics, but also to improve the quality of all analytical outputs.

This blog tells the story of the review and its findings, from the perspectives of HMRC and OSR. We hope to inspire other producers to think about how they can build on their own approach to quality, to ensure statistics meet the needs of the people who use them.

Jackie Orme, Programme Lead, HMRC

In 2019 HMRC identified an error in published corporation tax receipt statistics, which led to us having to make substantial revisions. This was a serious concern both internally for HMRC and for external users of HMRC statistics. In response we undertook a number of actions, including initiating an internal audit review and inviting OSR to review the principles and processes underpinning production of our official statistics.

The review by OSR was particularly important to us as statisticians and analysts in HMRC, to draw on expert and independent advice in improving our ways of working. While some of the findings could potentially be uncomfortable, the review would support our desire to take a broad and ambitious approach to improvement and the weight of OSR’s views and advice would give credence to the need for change.

The review was carried out efficiently and we were kept well-informed about progress. The OSR review team devoted lots of time to talking to staff and stakeholders to get their input and views, across all grades and professions. This level of involvement has been helpful to us subsequently in securing initial engagement and agreement to changes across the organisation. For example, in getting active support from senior HMRC leaders to implement recommendations, such as creating a new cross-cutting team as part of our analysis function to build on our existing approach to data quality and assurance.

The review has given us the opportunity to reflect on data quality issues and the importance of having robust data to produce high quality statistics and analysis. We have built a substantial programme of work to implement the recommendations and are starting to recruit people to the new team. Some recommendations will be straightforward to implement. For example, we have already started to review our statistics outputs, in order to make sure analytical resource is being used effectively.

In contrast, other recommendations are more challenging to implement, in particular, mapping the journeys of our data within the department. This will take significant combined effort by analysts, data providers and data processors.

As highlighted in the report, HMRC has some older systems for processing and storing its administrative data and the review has been helpful in emphasising how essential it is for analysts to be involved in discussions and decisions around the design of future systems. These sorts of insights from the report have helped us build a case for increased resource and forge stronger links with data providers, to work together to improve the quality of HMRC’s statistics and analysis.

Helen Miller-Bakewell, Project Manager, OSR

We were really pleased when HMRC asked us to do this review: in doing so, it showed a proactive and open approach to strengthening the quality of its official statistics.

It’s the first time we’ve done a piece of work that looks across all of a producer’s official statistics at once – although we have now done something similar with the Defra Group (The Department for the Environment and Rural Affairs and its agencies and public bodies), with a focus on user engagement. Normally, we look at one set of statistics in detail, or we review how statistics on a topic area come together to meet user needs. This was somewhere in the middle!

To inform the review, we spoke with a wide range of people involved in the production of official statistics in HMRC; analysts working on the statistics directly, managers who oversee them and a handful of people indirectly involved in the production process, who own and supply data.

The OSR team spent about an hour with each individual or team we interviewed, during which we asked lots of questions about the production process. This helped us to understand how the quality of statistical outputs was managed in HMRC, and the challenges analysts can face.

It turned out to be a useful process for the producer teams as well, and we were asked for our question list a couple of times, to help them think about the quality of their statistics in the future. We’ve now packaged up this question list in a published guidance document, so that all producers can benefit from it.

The findings of the review highlight the issues that big operational departments working with administrative data can face with respect to quality and will ring true for other Government departments. The recommendations stress the importance of analysts fully understanding the nature and quality of data they are working with, and of building effective working relationships with data providers or managers to facilitate this.

In addition, OSR champions a broad approach to quality assurance of data and statistics, and regular reviews of publications to ensure analytical resource is being used effectively. The report emphasises the importance of having analytical leaders that champion and support changes and innovations that can enhance quality, while recognising that analysts do not operate in isolation and that long-term improvements to quality management rely on understanding, values and responsibility being shared across organisations.

We’re pleased the review has been so helpful to HMRC. We would like to thank everyone who gave their time to speak with us during the review. Their cooperation and openness were key to us arriving at findings that resonate with analysts working in HMRC and recommendations that will have a lasting positive impact on the quality of HMRC statistics.