Fostering a robust government evaluation culture

Following the publication of our Analytical leadership: achieving better outcomes for citizens report in March 2024, we are running a series of blogs to highlight examples of strong analytical leadership in practice. Analytical leadership is a professional way of working with data, analysis or statistics that ensures the right data are available for effective policy and decision-making to improve the lives of citizens. Everyone in government can demonstrate analytical leadership, regardless of their profession or seniority by drawing on the six enablers of analytical leadership and a ‘Think TQV’ approach. 

As the first blog in this series, Catherine Hutchinson, Head of the Evaluation Task Force (ETF), talks about what the ETF has been doing to make sure that robust evidence is the driving force behind government spending decisions on policies and programmes. The ETF is all about building a culture where evidence takes centre stage, helping the government make smarter, more effective choices that really make a difference in people’s lives.  

The work of the ETF has clear relevance to our analytical leadership findings, particularly the need to ‘foster an evidence-driven culture’, ‘demonstrate transparency and integrity’, ‘invest in analytical capability and skills’, and ‘draw on analytical standards and expert functions.’   

What is the Evaluation Task Force?

The Evaluation Task Force is a joint Cabinet Office-HM Treasury unit providing specialist support to ensure that robust evidence sits at the heart of government spending decisions. Its goal is to increase the effectiveness and efficiency of Government decision-making, and to improve confidence that the policies and programmes the government invests in are actually working and delivering the results that matter most. We were once described by a minister in a speech as the antidote to the “sugar rush” of policy making!

What have we been up to?

Since we got started in April 2021, the ETF has made big strides in building evaluation into government’s spending processes, supporting departments to design and deliver robust evaluations, and building capability of our civil servants to create better evidence. The ETF has advised on over 380 programmes across government valued at £202 billion, helping departments to design and deliver robust evaluations. These evaluations will go on to generate evidence in key areas of government policy. These are individual programmes we are supporting, but the real impact comes from the culture change and changes to ways of working.

How are we fostering an evidence culture?

We’re delivering a range of activities to foster a culture of robust evaluation, while also promoting transparency, collaboration, enhancing evaluation capabilities and skills, and the use of analytical standards. As well as our role with HMT making sure evidence is underpinning spending proposals we do a lot with departments

Supporting departments to design and deliver robust evaluations

  • We’ve supported 18 departments to develop and publish evaluation strategies which set out each department’s approach to evaluating programmes and building robust evidence, ensuring evaluation is integral to each department’s policy making process.
  • We are updating the Government’s major projects review, last conducted by the Prime Minister’s Implementation Unit in 2019. The findings, which are expected this year, will help us identify critical barriers, and suitable projects within the £805 billion portfolio, where the ETF can make a significant difference by providing evaluation advice and support.
  • The ETF manages the Evaluation and Trial Advice Panel (ETAP) – a free service to support civil servants to develop high quality and robust evaluations. The ETAP delivers a range of services including advice surgeries, one-to-one advice, document reviews, and teach-ins. The Panel has been running since 2015 and has supported over 200 government programmes during that time, promoting the use of high quality evidence across the civil service.

Building capability of our civil servants to create better evidence

  • We have built evaluation capability across the civil service by developing and delivering the Evaluation Academy, a train the trainers programme. The Academy has so far trained more than 100 civil servants, equipping them with the skills and knowledge they need to pass on to colleagues to make sure evidence informs their work. These trainers have already trained over 1,100 people in their departments.
  • The ETF has published an updated 5-year What Works Strategy, which outlines the government’s approach to improving the way it uses evidence from the “What Works” network to inform decisions about public services.

Funding innovative approaches and expanding the evidence base

  • We teamed up with our HM Treasury colleagues to design and scope the Labour Markets Evaluation and Pilots Fund. The fund is providing £37.5 million to expand the evidence base on what works to improve labour market outcomes in the economy. This fund supports the generation of high-quality evidence to drive positive change in the labour market. For example, we are funding a £7.4 million pilot scheme to support AI skills for businesses announced in the recent Budget.
  • The ETF also manages the £15m Evaluation Accelerator Fund (EAF) to support a range of departments to plug key evidence gaps in priority policy areas across government such as crime, health and youth-wellbeing and supporting innovative approaches to deliver public services. The fund helps create evidence-based solutions to pressing policy challenges and encourages innovative, data-driven approaches to public service delivery.

For example, earlier this year we had our first-ever evaluation themed ministerial visit, with the Minister for the Cabinet Office (MCO) going up to West Yorkshire to know more about the EAF-funded Domestic Abuse forensic marking project, part of a programme of interventions to tackle violence against women and girls, as well as projects designed to make the most of AI and machine learning.

What are we doing next?

We’re excited to launch a new Government Evaluation Registry this month. The Registry will bring together all planned, live and completed evaluations from Government Departments in a single accessible location, providing an invaluable tool for understanding “what works” in Government. The Registry will make it easier for policymakers and practitioners to access and use analytical evidence in their work.

Longer term, we will be continuing our work on embedding evaluation into major government programmes, so that evaluation becomes part of the project delivery furniture. We will be making sure all departments publish findings on policy evaluations in a timely and transparent way – stopping people reinventing the wheel and reinventing the broken wheel too! Ultimately, our mission is to make the UK Government a world leader in evidence-based policy making.

How can you get involved?

If you’re a civil servant who wants to learn more about evidence-based policymaking or want to contribute to the ETF’s mission, consider:

  1. Get in touch with the friendly Evaluation Leads for your department and understand how you can work with them, for example by attending the sessions the Evaluation Academy trainers put on to build your skills in using evidence to inform your work.
  2. Seeking advice from the Evaluation and Trial Advice Panel (ETAP) when designing an evaluation for your policy or programme to ensure it meets the highest standards of analytical rigour.
  3. Exploring the Government Evaluation Registry (once launched later this month) to learn from the findings of past evaluations and identify opportunities for collaboration in generating and using evidence to drive better outcomes.

Our work in building an evidence-driven culture within the government is absolutely crucial for ensuring that public funds are spent effectively and efficiently, and that policies and programmes are delivering benefits. By prioritising robust evaluation and evidence-based decision-making, the government can continuously improve its services and better serve the needs of its citizens.

Please get in touch with us to find out more – – or visit our website.

What does it mean to be an accredited official statistic?

In our latest blog one of our Head Statistics Regulators discusses what it means to be an accredited official statistic and how official statistics accreditation can help users and producers of statistics…

Here at the OSR we are responsible for carrying out assessments to determine whether a set of statistics can be confirmed as accredited official statistics – a designation previously known as badged as National Statistics.

I often get asked what does it mean to be an accredited official statistic?

The OSR defines accredited official statistics as official statistics that we have independently reviewed and confirmed as complying with the standards of trustworthiness, quality and value in the Code of Practice for Statistics (the Code).

For me, accreditation is a shortcut. It’s a quick way of signalling to users of the statistics that the standards of the Code have been met. It’s similar to a quality mark, but what quality is being assessed: that of the processes to produce the statistics or that of the statistics themselves?

“It’s more than just quality”

The Code encompasses much more than quality. Accreditation implies not only the good quality of the statistics themselves but also that they are presented, quality assured and disseminated according to set standards. It is about the value of the statistics for users and whether they are robust enough to bear the weight of decision-making required of them. Thus, accreditation considers both the processes and the statistics themselves and the structures, people and organisations, that support statistics planning, production and communication.

“Context and use matter”

When we assess the quality of the statistics themselves we are not looking for them to meet a ‘gold standard’.  We recognise that any statistic is only ever a best estimate at a particular point in time. It also depends on context and whether the statistics are good enough for their intended use, which will vary according to user and societal need. For example more-timely but less-accurate data (due to gaps in data sources to ensure timeliness) may be acceptable in one context but would not be in another. It can take two years after the reference period for a relatively comprehensive picture of the economy and so GDP, to emerge. However, more-timely GDP statistics are needed to inform policy, budgeting, investment and employment decisions in the public and private sectors.

However, what we do require is for producers of statistics to ensure that they are producing the most appropriate estimate available by ensuring suitable data sources are used, methods are robust and estimates are quality assured. This work should be carried out with the context and use of the statistics in mind and in an open, professional and transparent way, by making clear any limitations to the data, inherent uncertainty due to the timeliness of the data, or planned revisions so users can use the statistics appropriately for their needs. More detail on our approach to quality is provided in our publication Quality and statistics: An OSR perspective.

What if the statistics are not accredited?

If statistics are not accredited, it doesn’t mean that they aren’t trustworthy, of a high quality or valuable. It also doesn’t mean that they are. What it does mean though is that we haven’t independently checked that they comply with the standards of trustworthiness, quality and value in the Code. Being a chartered naval architect or having a plumbing and heating qualification signals that someone external has checked you can do something to a particular standard, e.g. build and design ships or install and maintain boilers and heating appliances. However, that doesn’t mean someone without official qualifications can’t also do the same things. You would just want to look for more evidence that they can, e.g. references, evidence they understand the relevant standards and rules they should be following etc.

We encourage all users of statistics, to ask themselves some questions to ensure the data are fit for their purposes. These include considering where the data has come from, why has it been collected, how well the data fits with the concept you are trying to measure, what checks have been carried out to assure the data, how can you access the data. Some more detail on things to consider are set out in our guidance on questions for data users.

What if the accreditation is removed?

Legally only we (the UK Statistics Authority) can remove the badge, i.e. the accreditation, from a set of statistics. We may decide on this course of action for a number of reasons. These could be related to concerns around the quality of the data sources used to produce the statistics, where user need is not being met or where substantial changes to the data sources and methods require us to conduct a review to ensure the quality of the data is such that they continue to be applicable for their intended use. The reason(s) should be included in the release and/or the announcement explaining why the accreditation has been removed.

What if some of the input statistics are not accredited?

Different sets of statistics often feed through into others. For example, migration data are used to inform population estimates and projections. Data from the labour force survey feed into productivity estimates. Producers of data and statistics should always quality-assure their data sources and be aware of any changes to them. The extent of quality assurance required and the weight placed on different data sources will vary depending on factors such as how much they affect the overall calculation, or whether there are any alternatives. We would expect this information to be communicated to users so they can understand the quality-assurance processes carried out and why the producer has decided that the data sources are fit for use. This is the case regardless of whether the source data are accredited or non-accredited.

How do I get my statistics accredited?

If you are a government department of official body that produces official statistics, and you have a set of statistics that you consider meet the standards of trustworthiness, quality and value in the Code then you can ask us to assess them. The benefits of doing so include:

  • An independent assessment of the processes, methods and outputs used to produce the statistics against the recognised standards of the Code.
  • Public demonstration of your organisation’s commitment to trustworthiness, quality and public value.

Your first step towards assessment is to talk with your Head of Profession for Statistics who can provide guidance and points to consider.

If you are producer of data, statistics and analysis which are not official statistics, whether inside government or beyond you can contact us to discuss voluntarily application of the Code. While this approach will not lead to accredited official statistics it is a public demonstration of your commitment to the standards of the Code, which many organisations find beneficial for their work.

How do I find out more?

Related reading:

Futureproofing the Code of Practice for Statistics: findings and next steps from our review

National Statistics designation review

Our current position on regulating, responding to and using AI

In our latest blog, our Head of Data and Methods discusses the benefits and risks of AI in official statistics, and outlines OSR’s strategy for AI in the year to come…

Artificial Intelligence (AI) has quickly become an area of interest and concern, catalysed by the launch of user-friendly models such as ChatGPT. While AI appears to offer great opportunity, increased interest and adoption by different sectors has highlighted issues that emphasise the need for caution.

OSR is interested in actual and potential application of AI for production and use of official statistics, and to our own regulatory work, in the context of wider government use of AI. All Civil Service work must abide by the Civil Service Code to demonstrate integrity, honesty, objectivity and impartiality. Additionally, statistical outputs should follow the Code of Practice for Statistics by offering public value, being of high quality and from trustworthy sources. While AI models offer opportunities worth exploring, these need to be considered alongside risks, to inform an approach to use and regulation of use that is in line with government standards and supports public confidence.Code of Practice for Statistics by offering public value, being of high quality and from trustworthy sources. While AI models offer opportunities worth exploring, these need to be considered alongside risks, to inform an approach to use and regulation of use that is in line with government standards and supports public confidence.

This blog post outlines our current position on AI and our plans for monitoring and acting on AI opportunities and risks in 2024.

The benefits of AI

AI models can quickly analyse large volumes of data and return results in a variety of formats, although, at the time of publishing this post, we are not aware of any examples of AI being used to produce official statistical outputs. There are, however, feasibility studies being undertaken by the Office for National Statistics (ONS) and other Government Departments to support statistical publications, and some examples of AI use in operational research, such as:

  • improving searchability of statistics on their respective websites,
  • production of non-technical summaries,
  • recoding occupational classification based on job descriptions and tasks, and
  • automatically generating code to replace legacy statistical methods.

Risks of AI

While the potential benefits for AI use in official statistics are high, there are that warrant application of the Code of Practice for Statistics pillars of trustworthiness, quality and value.


There is concern around how AI might be used by malicious external agents to undermine public trust in statistics and government. Concerns include:

  • promoting misinformation campaigns ranging from targeted advertising to generated blog posts and articles, up to generated video and audio content from senior leaders such as Rishi Sunak and Volodymyr Zelenskiy.
  • Flooding social media and causing confusion around political issues such as general elections. AI could be used to generate more Freedom of Interest (FOI) or regulation requests than a department can feasibly handle, thus causing backlogs or losing legitimate requests in the chaos.
  • AI-generated (which are when a generative AI tool produces outputs that are nonsensical or inaccurate) presenting incorrect information or advice that might, at best, raise questions about how public sector organisations use personal data and, at worst, open public sector bodies up to legal action.


Significant concerns have been raised regarding AI model accuracy and potential biases introduced via their training data, as well as data protection of open, cloud-based models. The Government Digital Service found that their GOV.UK Chat had issues with hallucinations and accuracy that were unacceptable for public sector work. Given most AI models operate within a “black box” where the exact processes and methods are unknown and unable to be traced, it is difficult for producers to be completely transparent about how these systems produce the outputs. Close monitoring of developments in the field of AI and continual communication with statistics producers will be vital to understand the different ways AI systems may be used in both statistical production and statistical communication.


The concerns around trustworthiness and quality of AI-generated statistical outputs and communications impacts their perceived value, both to organisations and to the public. The latest wave of the Public Attitudes to Data and AI Survey suggests that public sentiment towards AI remains largely negative, despite the perceived impact of AI being reported as neutral to positive. The potential value will emerge over time as more AI products make their way into widespread use.

OSR’s strategy for AI in 2024

We are considering AI and our response through two lenses:

  • Use of AI systems, such as Large Language Models (LLMs), in the production and communication of official statistics, and how OSR regulates this; and,
  • Responding to use of AI to generate misinformation.

Regulating use of AI systems in the production and communication of official statistics

There are many organisations developing guidance for how AI should be used and regulated. OSR is following these conversations. So far, we have contributed to the Pro-innovation AI Regulation policy paper from Department for Science Innovation and Technology, a white paper on Large Language Models in Official Statistics published by the United Nations Economic Commission for Europe, and to the Generative AI Framework for His Majesty’s Government, published by Central Digital and Data Office. We endorse the direction and advice offered in these frameworks and consider they provide solid principles that apply to regulation of use of AI in official statistics.

Responses to our recent review of the Code suggested people think that the Code does indirectly address issues around AI use for official statistics, both in terms of encouraging exploration of potential benefits and controlling quality risks. Going forward, providing guidance relating to specific issues around AI alongside the Code could allow OSR to provide relevant support in a dynamic way. We already have our Guidance for Models, which explains how the pillars in the Code help in designing, developing and using statistical models and is very relevant in this space. More widely, the Analysis Function will also be undertaking work to ensure that analytical guidance reflects the use of AI within analysis in future.

OSR will continue to discuss potential and planned use of AI in official statistics production with producers, to stay aware of use cases as they develop, which will inform our onward thinking.

Responding to use of AI to generate misinformation

With a UK election to be held this year it is vital to understand how AI systems may be used to compromise the quality of statistical information available to the public, and how the same technology may be used to empower producers and regulators of statistics to ensure statistics serve the public good. We will continue to be involved in several cross-Government networks that deal with AI. These include the Public Sector Text Data Subcommunity, a large network to develop best practice guidance of text-based data across the public sector, as well as other Government Departments and regulatory bodies thinking about use of information during an election.

Next steps

There will be many more unforeseen uses for this versatile group of technologies. As AI developments are occurring at speed, we will be regularly reviewing the situation and our response to ensure compliance with the Code. If you would like to speak to use about our thinking and position on AI, please get in touch via We are particularly keen to hear of any potential or actual examples of AI being used to produce official statistics.

A statistical jigsaw: piecing together UK data comparability

In our latest blog, our Head of Private Office discusses comparability of data across the UK, which was topical at a recent Public Administration and Constitutional Affairs Select Committee…

When Ed Humpherson, Director General of OSR gave evidence recently to the Public Administration and Constitutional Affairs Select Committee (PACAC), one of the issues raised at the session was comparability of data across the UK.

For context, in 2023 the Committee launched their inquiry focused on transforming the UK’s statistical evidence base (you can read more about the issue of transparency that Ed explored with the Committee in his earlier ). Ed was the last witness to give evidence to the inquiry and the issue of comparability came up in several of the previous sessions with other witnesses.

Meeting user needs is not always straightforward, especially when that need is comparing data across the UK. As Ed explained to the Committee, the configuration of public services will probably be different across the UK, because of different policy and delivery choices that have been made by the distinct devolved governments. This is the nature of devolution, but a consequence is that administrative data may be collected, and reported, on different bases.

In our view, though, it is not sufficient for producers to simply state that statistics are not comparable. In line with the Code of Practice for Statistics they should recognise the user demand, and explain how their statistics do, and do not, compare with statistics in other parts of the UK. And producers should undertake analysis to try to identify measures that do allow for comparison, or to provide appropriate narrative that helps users understand the extent of comparability.

A very good example of this approach is provided by statisticians in the Welsh Government. Their Chief Statistician published two blogs on the comparability of health statistics, Comparing NHS performance statistics across the UK and Comparing NHS waiting list statistics across the UK. These blogs recognise the user demand and set out additional analysis carried out by analysts at the Welsh Government in collaboration with analysts in NHS England to accurately understand the differences between the definitions of NHS waiting times between the two nations. The blogs then adjust Wales’s own figure to produce an additional measure which is broadly comparable with that of England. More generally, the Chief Statistician’s blogs are a good example of providing guidance and insight to users across a wide range of statistical issues.

In addition, the Welsh Government’s monthly NHS performance release also highlights what can, and cannot, be compared.

And it’s not just the Welsh Government. During the evidence session Ed also mentioned the approach taken by NHS England to highlight the most comparable accident and emergency statistics. NHS England provide a Home Nations Comparison file for hospital accident and emergency activity each year. Since the session, statisticians from the across the UK have jointly produced analysis of the coherence and comparability of A&E statistics and advice on how they should and should not be compared, published on 28 February.

More generally, statisticians across the UK are undertaking comparability work across a range of measures. It is also important to recognise that at the level of health outcomes – things like smoking rates and life expectancy – figures are less related to the delivery of NHS services and are therefore more readily comparable. In addition to work on health comparability, statisticians have examined other cross-UK issues. For example, there is also very a good analysis of differences in fuel poverty measurement across the four nations.

So, whilst we at OSR, of course, champion comparability of data and believe it should be a priority for government, we are not alone. The examples in this blog demonstrate that statisticians are recognising, and taking steps to meet, user demand for comparability. And we have written to the Committee to highlight the activities that are described here.

We are looking forward to the results of the enquiry and their recommendations on how we can all have a role in transforming the UK statistical evidence base for the better.


Related correspondence:

Ed Humpherson to William Wragg MP: Supplementary evidence to the Public Administration and Constitutional Affairs Committee

How do we use statistics in everyday life?

In our latest guest blog, Johnny Runge and Beti Baraki from the Policy Institute at King’s College London discuss how individuals may use statistics in their personal lives, and they ask you to get in touch with suggestions.

We are launching an exciting new project on whether and how people use statistics to make personal decisions, and we want your help. Please read more below, and then complete this short survey.

Our lives are shaped by the decisions we make, and they can often be daunting, especially when they are big life decisions. But how do we arrive at these decisions? In unravelling this, we can consider whether decisions are deliberate or coincidental, if they rely on intuition, or tradition, and what role other people play in this, such as whether we seek advice or are  affected by friends and family or companies, celebrities and influencers.

Within this inevitably complex range of factors is our research question: to what extent do we consider evidence, data and official statistics when we make a personal decision?

For instance,

If statistics are taken into account, we want to deepen understanding of this process. We seek to gain insights into how deliberate or not the use of statistics is, including whether people realise they are using it at the time, and whether they actively seek out these statistics and data or randomly come across them. For instances where statistics are used, we are also interested in the extent to which individuals feel this improved their decisions.

If people do not consider any data, we want to better make sense of why they do not. We aim to uncover whether these individuals would have done so, if they knew relevant statistics existed, if they had been more accessible, or if they had trusted them more.

These are the types of questions we will explore in a new project using semi-structured interviews, commissioned by the Office for Statistics Regulation (OSR), and led by the Policy Institute at King’s College London, in collaboration with the Behavioural Insights Team. OSR has previously written about why they are interested in the role statistics play in personal decision-making, and how they see it as crucial that the wider public can use official statistics directly to inform their decisions.

To start the project, we want to create a list of as many examples as possible about how statistics can (or should) be used to inform personal decisions. We ask for your help, whether you are a member of the public, or a statistician, researcher or policymaker.

Do you have any ideas or suggestions about decisions that could (or should) be informed by data or statistics? Or, if you are an expert in a certain area, think about the key statistics in your area, and think about what personal decisions they could potentially inform? If that is the case, we would really appreciate if you complete this brief survey.

Transparency: bringing the inside out

In our latest blog Director General for Regulation, Ed Humpherson, discusses the divergence between internal positivity and external scepticism about analysis in Government, and how transparency is key to benefitting the public good…

Seen from within Government, these are positive times for analysis. There is an analysis function, headed by Sir Ian Diamond, which continues to support great, high-profile analytical work. There are strong professions, including economists, statisticians, operational researchers and social researchers, each with strong methods and clear professional standards. There is an Evaluation Task Force, which is doing great things to raise the profile of evaluation of policy. And data and analysis are emphasised by Ministers and civil service leaders like never before – exemplified by the 2023 One Big Thing training event focused on use of data and analysis in Government.

Yet the perspective from outside Government is quite different. The Public Administration and Constitutional Affairs Select Committee has been undertaking an inquiry into Transforming the UK’s Statistical Evidence Base. Several witnesses from outside Government who’ve given evidence, and some of the written evidence that has been provided, highlights concerns about the availability of analysis and how it’s used. In particular, witnesses questioned whether it’s clear what evidence sources inform policy decisions.

What explains this divergence between internal positivity and external scepticism?

In my view, and as I said in my own evidence before the Committee, it all comes down to transparency. By this I mean: the way in which analysis, undertaken by officials to inform Ministers, is made available to external users.

This is highly relevant to the Committee’s inquiry. A key question within the inquiry is the way in which external users can access analysis undertaken within Government.

These questions are very relevant to us in OSR. We have developed the principle of Intelligent Transparency. You can read more here, but in essence, Intelligent Transparency is about ensuring that, when Government makes statements using numbers to explain a policy and its implementation, it should make the underlying analysis available for all to see.

As I explained to the Committee, we make interventions when we see this principle not being upheld – for example, here and here. When we step in departments always respond positively, and the analysts work with policy and communications colleagues to make the evidence available.

My basic proposition to the Committee was that the more Government can comply with this principle, the more the gap between the internal insight (there’s lots of good analysis) and the external perception (the analysis isn’t used or made available), will close. This commitment to transparency should be accompanied by openness – the  willingness to answer questions raised by users; and a willingness to acknowledge the inherent limitations and uncertainties within a dataset.

In terms of what we do at OSR, I wouldn’t see any point, or value, in us going upstream to consider the quality of all the analysis that circulates within Government.

Our role is about public accessibility and public confidence – not about an internal quality assurance mechanism for economics, operational research, social research and other types of analysis undertaken in Government. We are not auditors of specific numbers (ie a particular figure from within a statistical series) – something we have to reiterate from time to time when a specific number becomes the focus of political debate. Nor do we have the resources nor remit to do that. But we DO have both the capacity and framework to be able to support the appropriate, transparent release and communication of quantitative information.

This is the heartland of our work on statistics, and it’s completely applicable to, say, economic analysis of policy impacts, or evaluations of the impact of Government policy. There are good arrangements for the quality of economic analyses through the Government Economic Service (GES), and the quality of evaluations through the Evaluation Task Force (ETF); and similarly for the other disciplines that make up the Analysis Function. The ETF is a new kid on this particular block, and it is a great innovation, a new force for driving up the standards and openness of Government evaluations.

Where we add value is not in duplicating the GES, or ETF, or similar professional support structure within Government. Indeed, we already work in partnership with these sources of support and professional standards. Our expertise is in how this quantitative information is communicated in a way that can command public confidence.

In short, then, it really does come down to a question of transparency. As I said to the Committee, it’s like a garden in the early morning. Some of it is in the sunlight already, and some of it still in shade. Gradually, we are seeing more and more of the lawn come into the sunlight – as the reach of transparency grows to the benefit of the public.

The success and potential evolution of the 5 Safes model of data access

In our latest blog Ed Humpherson, Director General for Regulation discusses the 5 Safes model as a key feature to support data sharing and linkage…

In OSR’s data linkage report , we highlighted the key features of the data landscape that support data sharing and linkage. The 5 Safes model is one of those. Yet we also recommended that the 5 Safes model is reviewed. In this blog, I want to focus on one aspect of the model and set out the case for a subtle but important change.

The 5 Safes model is an approach to data use that has been adopted widely across the UK research community, and has also been used internationally. It is well-known and well-supported and has had a significant impact on data governance. It is, in short, a huge success story. (And for a short history, and really interesting analysis, see this journal article by Felix Ritchie and Elizabeth Green).

The 5 Safes are:

  • Safe data: data is treated to protect any confidentiality concerns.
  • Safe projects: research projects are approved by data owners for the public good.
  • Safe people: researchers are trained and authorised to use data safely.
  • Safe settings: a SecureLab environment prevents unauthorised use.
  • Safe outputs: screened and approved outputs that are non-disclosive.

Any project that aims to use public sector administrative data for research purposes should be considered against the 5 Safes. The 5 Safes therefore is used to set a criteria-based framework for providing assurance about the appropriateness of a particular project.

OSR’s recommendations relevant to the 5 Safes:

In July 2023, OSR published our report on data sharing and linkage in government. We had a range of findings. I won’t spell them out here, but in short, we found a good deal of progress across Government, but some remaining barriers to data sharing and linkage. We argued that these barriers must be addressed to ensure that the good progress is maintained.

We made two recommendations relevant to the 5 Safes:

  • Recommendation 3: The Five Safes Framework Since the Five Safes Framework was developed twenty years ago, new technologies to share and link data have been introduced and data linkage of increased complexity is occurring. As the Five Safes Framework is so widely used across data access platforms, we recommend that the UK Statistics Authority review the framework to consider whether there are any elements or supporting material that could be usefully updated.
  • Recommendation 10: Broader use cases for data To support re-use of data where appropriate, those creating data sharing agreements should consider whether restricting data access to a specific use case is essential or whether researchers could be allowed to explore other beneficial use cases, aiming to broaden the use case were possible.

We made the recommendation about reviewing the framework because a range of stakeholders mentioned to us the potential for updating the 5 Safes model, in the light of an environment of ever-increasing data availability and ever-more powerful data processing and analysis tools.

And we made the recommendation about broader use cases because this was raised with us as an area of potential improvement.

The use of 5 Safes in research projects

What brings the two recommendations together is the 5 Safes idea of “safe projects”. This aspect of the model requires research projects to be approved by data owners (essentially, the organisations that collect and process the data) for the public good.

For many research activities, this project focus is absolutely ideal. It can identify how a project serves the public good, what benefits it is aiming to bring, and any risks it may entail. It will require the researcher to set out the variables in the data they wish to explore, and the relationships between those variables they want to test.

For some types of research, however, the strictures of focusing on a specific project can be limiting. For example, for a researcher who wants to establish a link between wealth and some aspects of health may not know in advance which of the variables in a wealth dataset, and which of the variables in a health data set, they wish to examine. Using the “safe project” framing, they might have to set out specific variables, only to discover that they are not the most relevant for their research. And then they might have to go back to the drawing board, seeking “safe project” approval for a different set of variables.

Our tentative suggestion is that a small change in focus might resolve these problems. If the approval processes focused on safe programmes, this would allow approval of a broad area of research – health and wealth data sets – without the painstaking need to renew applications for different variables within those datasets.

What I have set out here is, of course, very high level. It would need quite a lot of refinement.

Other expert views on the 5 Safes

Recognising this, I shared the idea with several people who’ve spent longer than me thinking about these issues. The points they made included:

  • Be careful about placing too much emphasis on the semantic difference between programmes and projects. What is a programme for one organisation or research group might be a project for another. More important is to establish clearly that broader research questions can be “safe”. Indeed, in the pandemic, projects on Covid analysis and on Local Spaces did go ahead with a broader-based question at their heart.
  • This approach could be enhanced if Data Owners and Controllers are proactive in setting out what they consider to be safe and unsafe uses of data. For example, they could publish any hard-line restrictions (“we won’t approve programmes unless they have the following criteria…”). Setting out hard lines might also help Data Owners and Controllers think about programmes of research rather than individual projects by focusing their attention on broader topics rather than specifics.
  • In addition, broadening the Safe Project criterion is not the only way to make it easier for researchers to develop their projects. Better meta data (which describe the characteristics of the data) and synthetic data (which create replicas of the data set) can also help researchers clarify their research focus without needing to go through the approvals process. There have already been some innovations in this area – for example, the Secure Research Service developed an exploratory route that allows researchers to access data before putting in a full research proposal – although it’s not clear to me how widely this option is taken up.
  • Another expert pointed out the importance of organisations that hold data being clear about what’s available. The MoJ Data First programme provides a good example of what can be achieved in this space – if you go to the Ministry of Justice: Data First – GOV.UK ( you can see the data available in the Datasets section, including detailed information about what is in the data.
  • Professor Felix Ritchie of the University of West England, who has written extensively about data governance and the 5 safes, highlighted for me that he sees increasing “well-intentioned, but poorly thought-through” pressure to prescribe research as tightly as possible. His work for the ESRC Future Data Services project sees a shift away from micro-managed projects as highly beneficial – after all, under the current model “the time risk to a researcher of needing a project variation strongly incentivises them to maximise the data request”.

More broadly, the senior leaders who are driving the ONS’s Integrated Data Service pointed out that the 5 Safes should not be seen as separate minimum standards. To a large extent, they should be seen as a set of controls that work in combination – the image of a graphic equaliser to balance the sound quality in a sound system is often given. Any shift to Safe Programmes should be seen in this context – as part of a comprehensive approach to data governance.

Let us know your thoughts

In short, there seems to be scope for exploring this idea further. Indeed, when I floated this idea as part of my keynote speech at the ADR UK conference in November, I got – well, not quite a rapturous reception, but at least some positive feedback.

And even if it’s a small change, of just one word, it is nevertheless a significant step to amend such a well-known and effective framework. So I offer up this suggestion as a starter for debate, as opposed to a concrete proposal for consultation.

Let me know what you think by contacting

Producing, reviewing, and always evolving: UKHSA statistics

In our latest guest blog Helen Barugh, Head of Statistics Policy and User Engagement, discusses transforming the statistics produced by the UK Health Security Agency…


What is the Health Security Agency?

The UK Health Security Agency (UKHSA) is responsible for protecting every member of every community from the impact of infectious diseases, chemical, biological, radiological and nuclear incidents and other health threats. We are an executive agency of the Department for Health and Social Care.

We collect a wide range of surveillance data about diseases ranging from influenza and COVID-19 to E.coli to measles. We publish statistics related to planning, preventing and responding to external health threats. You can find our statistics here.

UKHSA was born in October 2021 during the COVID-19 pandemic. The pandemic had a significant impact on the organisation, including statistical production, and the repercussions of that are still being felt.

Producing and reviewing official statistics

I joined UKHSA in August 2022, one of four recruits to a new division supporting the statistics head of profession. Our division, which has a mix of statisticians at different grades, aims to transform UKHSA statistical production and dissemination. We have all produced statistics in other government departments, and we use that expertise to provide advice, guidance and practical support to all aspects of statistics production and dissemination. Our division also includes two content designers who actually publish UKHSA’s official statistics.

One of the most important parts of our work is a programme of reviews looking in-depth at each UKHSA official statistics publication. This is a big programme of work, encompassing around 35 statistical series covering a range of topics and including weekly reports right through to annual reports. The reviews aim to:

  • bring consistency to our statistics production and outputs.
  • improve efficiency and quality assurance through the adoption of reproducible analytical pipelines (RAP) in line with our RAP implementation plan.
  • improve compliance with the Code of Practice for Statistics.
  • embed user engagement as a regular and standard activity.

We are part of the way through this programme, having reviewed around 20 series and with another 15 to go. We expect to finish the reviews by late summer 2024.

How do we review our statistics?

Our reviews have three main phases.

  • Desk-based research includes assessing products against the Code of Practice for Statistics, assessing publications for accessibility and clarity, reviewing desk notes and analysing google analytics to draw out insights about users.
  • Discovery work with the team explores the journey from data acquisition through to publication, understanding the processes used and the quality assurance in place. Sometimes we shadow a production cycle to really understand how the process works and how it can be improved. We also discuss user engagement with the team to investigate what they know about their users and how they assess any changing or emerging needs.
  • Once we have all the information we need, we write a report to summarise our findings and agree recommendations for improvement with the production team.

How do we make changes in practice?

We work with the production team, providing practical support, training and guidance as they implement the recommendations. We are aiming for incremental improvement, and the review provides a baseline against which we can measure success.

The reviews have given a terrific insight into the good practice within our organisation, as well as the challenges of producing some of our statistical products and the legacy processes that now need updating. Despite all the challenges of producing statistics during the pandemic, UKHSA statistics teams have been putting out very detailed and thorough statistics, in some cases on a weekly basis and with very short turnaround times between receiving data and publishing. Google analytics indicate that in general the readership is high, and user engagement so far has shown that products are highly valued and appreciated by users working in health protection and healthcare settings.

Areas for improvement are often similar across different statistical series. For example, production methods are not as reproducible as they could be. There are opportunities to introduce reproducible analytical pipelines (RAP) and build in automated quality assurance that will improve the efficiency and accuracy of production. We’ve also found that most outputs are aimed at a technical and clinical audience which limits their impact for the general public or for meeting the public good. As an organisation, we need to do more to understand the wider uses of our statistical products and adjust their presentation accordingly.

What difference are the reviews making?

One of the key benefits of the reviews has been building relationships between our division and UKHSA statistics teams. Our work only really has impact where teams get onboard and are enthused to make positive changes. So I’m delighted about the impact our work is having, with statistics teams working hard to improve their products. For example, charts are being redesigned to conform with best practice, RAP is being implemented in some publications, our first quality and methodology information report has been published and most publications are now in HTML. It all feels very positive!

So, what next?

We have more reviews to do to make sure we have an accurate picture of all our official statistics, and lots of opportunities to support production teams to make improvements. We’re planning more user engagement and also participating in the cross-government user consultation on health and social care statistics which will give us some valuable user feedback to help shape our statistics in the future.

We need to decide whether any publications designated on GOV.UK as ‘research and analysis’ should really be designated as official statistics. And as we bring about improvement right across the UKHSA statistics publications, we will be aiming for more products to be accredited by OSR to provide that stamp of approval that we’ve done the right things and are now meeting the highest standards of trustworthiness, quality and value.

If you’re interested in talking to us about our programme of reviews please do get in touch: We’re happy to share more about what we’ve learnt as well as the materials we’ve developed to support the review process.

You’re planning to do what? Statistics, resource constraints and user engagement

In our latest blog, Mark Pont, OSR’s Assessment Programme Lead and Philip Wales, NISRA’s Chief Executive discuss engaging with statistics users and how user input can help decision making… 

In his recent blog post about keeping a statistical portfolio (and a garden) sustainable, Rob Kent-Smith described some principles to consider when balancing scarce resources across a portfolio of statistics. In this post, Mark Pont, head of our compliance programme, brings this to life a little drawing on the recent experiences of NISRA – the Northern Ireland Statistics and Research Agency. Faced with some budgetary pressures, NISRA launched a consultation at the end of August 2023 and published a response just a few weeks ago.  

The Code of Practice for Statistics talks about the need to ensure that statistics remain relevant to users. The need for statistics producers to engage with users to understand their evolving needs is an important element of providing value.  

A myth that we sometimes hear at OSR is that accredited official statistics (the new name for what the Statistics and Registration Service Act 2007 calls National Statistics) can’t be stopped. We also hear that statistical outputs can only be added to. But it is important to recognise that just because a set of statistics has been produced in the past, this doesn’t mean that it must continue to be produced in the same way, with the same periodicity for evermore. Nor does the Code of Practice mandate a particular form of presentation or requirement for extensive commentary, as long as users’ needs are being met. To carry on the gardening idiom, sometimes plants need pruning or even removing to enable a garden to flourish. 

It’s therefore right that all options – reducing scope or frequency, or ceasing altogether – be considered.  

It’s also really important to recognise that a formal public consultation can form an important part of gathering users’ views. But this is best done within the context of more-proactive ongoing engagement, particularly with key decision-makers. 

It was therefore really good that Philip Wales, NISRA’s Chief Executive, contacted us to tell us about NISRA’s consultation. In the rest of this post, we ask Philip for some perspectives on how the consultation went, about how he went about engaging with users, and how their input helped his decision making.   


Mark: So Philip, first of all congratulations on the new role, which perhaps isn’t so new any more. How did it feel to be thrown straight into needing to make some tough decisions in the light of tight budgets? 

Philip: Thanks Mark – it’s been a challenging first ten months at NISRA, but I’ve really enjoyed it, and the time has flown by.  

You’re right to say that we faced – and continue to face – budgetary pressures at NISRA. Funding from our parent department will be around £1.9m lower in nominal terms this year compared to 2022-23. Because of inflation, that amounts to a real terms cut of close to 20% for our suite of economic, population and social statistics, not to mention our survey and data collection activities. 

To resolve this financial pressure, we’ve worked hard to find new sources of income and to move people into posts with dedicated funding, we’ve had to manage our resources well, and to think hard about our suite of outputs, which brings us to the consultation exercise we ran.  

Mark: How did you feel the consultation went? Was there anything that particularly pleased you about it, or its findings? 

Philip: The consultation we ran on our statistical outputs was an important part of managing our budgetary pressures. It gave us a chance to explain the financial context and communicate the pressures which NISRA is under to our users, and to talk about how we would manage them. It also encouraged us to think critically about  the work we do and where we add the greatest value.   

The consultation proposed changes to some of our planned outputs – either delaying them, scaling them back or suspending them – and enabled us to get feedback directly from our users.  

And on these terms, I think the consultation was a success. We had a large number of responses – from individuals, institutions, businesses and other organisations, as well as government departments – all of whom took the time to tell us that they really value the outputs we produce.  

From the feedback we got, we learned about where and how our releases are used, and we secured a better understanding of the potential impact of our proposed changes. Importantly, that feedback has helped to guide the changes we’re now making to our outputs.  

Mark: How has the consultation helped you to decide which activities and outputs to prioritise? And did you end up cutting back in the areas that you expected? 

Philip: The consultation helped us to work out how to minimise the impact of our proposals on our stakeholders.  

In lots of cases, users agreed that the changes we were proposing were the ‘least worst’ option available. Where we were combining outputs, or scaling them back to focus on the core headlines, users were understanding. Feedback also indicated that, in general, the outputs we were suspending were adding less value than our other activities: a sign we were focussing on the right things.  

Where we did meet real concern and resistance – particularly on some of our hospital infections releases and elements of our trade data suite – we listened. In these cases, we sought new and less resource intensive ways of meeting these needs.  

For me, this is the hallmark of a good consultation: asking people for feedback, listening, and then adjusting to account for their views.  

Mark: What were the most difficult parts of running the consultation? 

Philip: Well, it’ll be obvious that running a consultation like this – against a challenging financial background – isn’t a lot of fun!  

But I think the most difficult part of this process was the beginning. Sometimes, as producers, we can be a bit reluctant to ask the question ‘should I keep doing this?’ A bit like Rob said in his recent blog-post, it’s easy to think that an output should continue simply ‘because it has for a long time’, or ‘because it’s an Accredited Statistic’. People are protective of their outputs and can often be anxious when changes are discussed. 

I think this is a natural reaction, but it’s often an obstacle that we put in front of ourselves. The truth is that the skills of statisticians and analysts more broadly can be deployed in all kinds of really important ways across government, and we need to be thinking about how best to use those capabilities all the time. New datasets, new systems or new activities all mean there are new ways for skilled data analysts of all kinds to add value. In place of a guarded, defensive discussion, this lens really helped to promote the right kind of open discussion about outputs at NISRA.  

Mark: In conclusion, would you have any tips for others in a similar position? 

Philip: I think I’d give three short pieces of advice to someone undertaking an exercise like this one.  

Firstly, always keep in mind that making changes to an output isn’t a reflection on the people doing the work. Try and have an open, respectful conversation which captures the value that they can provide, recognising that sometimes less can be more.   

Second, trust and listen to one another. Changes like these are more likely to stick and less likely to have long term impacts on morale if it’s clear that you are doing this as part of a group.  

And third, trust your users. Listen to what they have to say and leave room to adjust your plans if you get unexpected feedback.  

If you want more advice about engaging with users as part of prioritisation exercises, please do contact us. 

NHS England guest blog: Mental health data quality

In our latest guest blog Gary Childs, Head of Analytical Delivery at NHS England, discusses steps NHS England has been taking in recent years to improve the quality of mental health data…


Within NHS England we have been striving to improve the quality of mental health data for a number of years. In this blog we will focus on the Mental Health Services Dataset (MHSDS), but the methodologies and principles apply equally to datasets reporting on topics such as NHS Talking Therapies or Learning Disabilities and Autism (LDA).  

Clinical Data in National Collections

Charles Babbage once said: “Errors using inadequate data are much less than those using no data at all”. However, there must be a threshold as poor-quality data can lead to inaccurate analytics, bad decisions, and in health, have an impact on patient care. The Government Data Quality Hub states that: “Good quality data is data that is fit for purpose. That means the data needs to be good enough to support the outcomes it is being used for”.  

In health, one would automatically assume that data would be of the highest quality as it is captured in clinical and operational systems for the purpose of direct patient care, and that assumption is probably true. The problem comes when you need to use this data to get a national picture of performance and for secondary uses. 

This requires data to be extracted from these clinical and operational systems in a standardised format for aggregation within a national setting. Taking mental health as an example, there are probably over 500 providers (many being relatively small) of mental health services (excluding NHS Talking Therapies); the actual number is unknown. Of those providers, we have identified at least 30 distinct IT systems in use (such as SystmOne, Epic and RiO), as well as many in-house systems. The data within these systems is held in differing structures and formats and key information will be captured as free text. 

To create the national collection for Mental Health (MHSDS – Mental Health Services Dataset) requires providers to make monthly record level submissions to NHS England. The provider must use the technical output specification (TOS) and user guidance to understand the scope and definition of each data item to be submitted. In addition, they have to familiarise themselves with the MHSDS intermediate database to understand how data items are grouped for the data submission file. To achieve this, providers have to carry out a ‘data mapping exercise’ to understand how well their existing systems align to the MHSDS TOS and take appropriate action to ensure that the standard is fully met. As Mental Health is a multifaceted service covering many policy areas, the data is complex and the submission process can be arduous, especially for smaller providers. 

Supporting Providers to Submit Data

A key focus has been on increasing the number of mental health service providers submitting to the MHSDS, which now stands at over 370 providers on a monthly basis and an estimated 99%+ of all NHS funded activity. This improvement from 85 providers in 2016 has been achieved through a variety of initiatives.  

Understanding who should be submitting is key and once they are submitting, knowing what services they provide and therefore the data they should be submitting. A Master Provider List is maintained that identifies all known providers in scope of the MHSDS together with their submission behaviours. Data submissions are tracked throughout the submission window, which together with historic submission behaviours result in tailored communications being sent to providers to encourage more positive behaviours. 

Ensuring All Data is Submitted

In collaboration with the CQC, regional leads and providers, it has been possible to identify the services that are being delivered by most providers. This has allowed us to assess whether providers are submitting all relevant data. This has been a particular problem with Independent Sector Service Providers (where they are delivering NHS funded services) and has required the intervention of DHSC and Health Ministers. 

Improving the Quality of Data

Once providers are submitting data across the service lines that they provide, we can assess the quality of that data and support providers to improve it. This starts with self service tools, at the point of submission providers receive a line-by-line data quality assessment. This can be a daunting report, hence a Validation and Rejection Submission Tool was developed that converts the record level submission report into an easy to understand summary of the issues, with instructions on how to fix them. 

Providers can resubmit the data as many times as they want within the submission window, improving the data each time. However, there are occasions where data issues are identified after the submission window is closed that can affect the quality of the data for that month but also have a knock-on effect on future monitoring, such as for 3 or 12 month rolling metrics. To address this issue a multiple submission window model (MSWM) was implemented to allow providers to address data quality issues throughout the financial year. Use of the MSWM is closely monitored and reported upon to avoid abuse of the facility, as it should be a last resort. 

To illustrate the quality of the data and the compliance of that data it is surfaced within a data quality dashboard that reports on the quality of each data item submitted by each provider, allowing for comparisons at a provider, regional and system supplier level. In addition, to promote the compliant use of SNOMED (a structured clinical vocabulary for use in electronic health records) relevant data items are reported upon within a SNOMED dashboard. The dashboard assesses how much SNOMED is flowing to MHSDS, to which data items and tables, and by which providers, there is also a focus on correctly identifying procedures and assessments. 

To reinforce these tools, providers receive an automated data quality report by e-mail each month. These reports summarise the key issues with a provider’s data and strategies to fix them. Providers can also access policy specific guidance in the form of workshops, webinars and documentation. This has previously focussed on topics such as eating disorders, restrictive interventions, problem gambling, perinatal and maternal services and memory clinics. In addition, providers have received questionnaires to better understand where they need more support. 

Talking About Data Quality

While all these tools facilitate better data it is the direct engagement with providers by the data liaison team that can have the biggest impact. The data quality analysis will identify the providers experiencing the biggest challenges, which is used by the data liaison team to provide tailored support on a 1-2-1 basis. This was particularly successful during the recent Advanced Cyber Incident that impacted the data of a variety of mental health service providers for almost 9 months. 

Next Steps in Improving Mental Health Data

At first sight all these solutions may seem excessive. However, the data is the foundation for decisions relating to commissioning, service improvement and service design, it supports research and innovation, and helps understand the impact of mental health care on patient outcomes and experiences. Through improvements in data quality, we have been able to close several duplicate collections and are now able to move to a single window for data submission. This will soon allow insights to be delivered a whole month earlier, making decisions more relevant and timelier. 

At the start of this blog, I stated that the data we are referring to is secondary uses data, but that data did originally come from clinical and operational systems that are used for direct patient care. As we know that this data is of a higher quality than that within MHSDS, we must find a way to mitigate the degradation in quality that we are currently seeing. Initiatives of this nature are currently being explored within NHS England in the hope that we can improve data quality further, make the data even timelier, as well as easier to collect.