Futureproofing the Code of Practice for Statistics

OSR has completed its review of the Code of Practice for Statistics

The first Code of Practice for Statistics was published in January 2009. In February 2018, OSR published version 2.0 of the Code of Practice for Statistics after a significant review and refresh of the standards we expect for the production of data and statistics. Since then we have seen a lot of change in the data landscape, increased desire for statistics from users, and changes in the ways statistics producers are working.

We think this is an ideal time for us to review the Code to make sure it remains relevant and to identify opportunities for us to make improvements.

From September through to December, OSR gathered feedback from stakeholders on the Code of Practice and ran a range of online sessions for all interested parties; exploring key topics relevant to the Code. A summary of these events can be found at the bottom of this page.

We have reviewed all the responses we received and published a response paper summarising the findings.

Considering the need for change

The second edition of the Code of Practice for Statistics was released in February 2018. It established a framework for the standards of statistics production grounded on three core principles or ‘pillars’:

  • Trustworthiness – confidence in the people and organisations that produce statistics and data
  • Quality – data and methods that produce assured statistics
  • Value – statistics that support society’s needs for information

Since that time the Code has been firmly embedded into the work of official statisticians and by a community of practitioners beyond official statistics.

Our review, The State of the Statistics System, has highlighted how well over recent years producers have responded to urgent needs for data and statistics and have continued to innovate in challenging circumstances – such as during the COVID-19 pandemic and since Russia’s invasion of Ukraine in February 2022. However, declining response rates, sample biases, and data privacy concerns can have a significant impact on the quality of statistics. In a wider landscape of technological advances, statistics need to remain relevant, accurate and reliable – the increasing use of new and alternative data sources and advances in technology are opportunities for the statistical system to embrace.

The role of the Code is to provide a clear steer for those producing statistics on the standards to be applied to ensure that statistics command public confidence. We have heard from a range of stakeholders and interested parties across a wide range of settings on their thoughts about the suitability of the Code and on how it can be adapted to meet future challenges and opportunities. The information provided will inform the OSR’s decision making on whether changes are required to the Code. Our call for evidence will also inform how we support organisations that produce statistics who wish to apply the standards of the Code in a voluntary way.

Event Summaries

Below you will find summaries and video recordings of events related to futureproofing the Code of Practice for Statistics. Expand the event boxes to see more.

On 13 September 2023, the Office for Statistics Regulation (OSR) held a launch event for our review of the Code of Practice, to ensure it continues to effectively serve the production and regulation of government statistics. One of our newly recruited Regulators Luke Boyce summarises the event and what was discussed.

When I first joined OSR, I was excited to hear about the upcoming Code review project, to ensure it remains relevant. Coming directly from a statistics team elsewhere in government I was already familiar with the Code of Practice and its importance in relation to statistics production in government.

However, getting to apply the Code in my regulatory work these last four months has given me a newfound appreciation for its importance, especially the value pillar. Without value, statistics in government can’t serve the public good, but without a Code that reflects the current statistical system, this mission is difficult to achieve.

Clearly I wasn’t the only one excited to hear about what’s next for the Code of Practice for Statistics, with almost 300 people, both from across government and the general public, in attendance. The last full refresh of the Code of Practice was 5 years ago in 2018 and the consensus among guest speakers, OSR colleagues and those that participated in the Q&A, was one of strong enthusiasm for adapting the Code to underpin a statistical system that is rapidly changing.

Topics covered Artificial Intelligence (AI) and its effect on society, the rise in administrative data use, live data and dashboards, and data linkage across government. Some of these topics will also be covered in later events during the Code review so keep an eye out for them if you’re interested.

It was great to hear from a variety of speakers, both from inside and outside government and how the Code of Practice impacts their work.

Tracey Brown, Director of Sense About Science talked about their mission to increase the public’s knowledge of evidence, directly in line with the Code of Practice, especially the trustworthiness and value pillars. She talked about the public’s increased interest in statistics in the post pandemic world and why this puts increased emphasis on official statistics serving the public good and why it’s important that we update the Code in line with the world we now live in.

Catherine Hutchinson, Head of the Evaluation Task Force at the Cabinet Office talked about how trust in government, and by decision makers in government, to use the evidence provided, can also be heavily reinforced through intelligent transparency and tackling the misuse of statistics, using the pillars of the Code. She explained how the evaluation of policy and operational decisions prior to full national or larger scale implementation is important as it ensures public money is spent effectively. To do this, quality statistics and evaluation reports that the public have free access to are required which can be enabled by an effective code of practice.

Stephen Aldridge, Director for Analysis and Data Department for Levelling Up, Housing and Communities, described how the Code of Practice informs everyday work for every analytical team in government, and highlighted the need for a Code that takes into account the appropriate use of new technologies and techniques, such as AI and cross government data linkage to enable analyst to carry out new innovative work.

He also highlighted how the Code can support all analytical work, including published management information. He argued that flexibility in the application of the Code is important, to ensure it is easier to apply to different types of statistics including those outside government. He added that data dashboards are a great new emerging tool in government statistics that allow the public to access live data rather than having to wait for infrequent releases – however, these dashboards can sometimes miss vital insight and commentary. A Code refresh could emphasise the importance of demonstrating trustworthiness outside of a traditional bulletin and allow official statistics to exist in a live format.

At the end of the event, there were many questions, and enthusiasm for the Code review. Questions included how the review will address the growing interest in real time data, allow the development of statistics that serve the public good and are not just tied to policy priorities, and how the Code of Practice applies to published government figures that aren’t produced by statisticians.

The launch event was just the start of the Code Review. On Monday 18 September 2023 OSR launched an online Call for Evidence for you to share your feedback with us about the Code. There will also be several more panel events, focussing on areas including data quality, data ethics and AI, and user demands. I’m really excited to attend these events and I hope to see you there.

Panel 1 recording: Maintaining data quality

Maintaining data quality – an event summary

Data quality is critical for all statistics – get it wrong and there is a massive risk that users will be sent off in the wrong direction and perhaps seriously misled. It was the subject for OSR’s first Futureproofing the Code panel session. We are keen to hear from experts about their perspectives of some challenging issues that are influencing statistical practice today.

We had a great line up for the panel with: Iain Bell (National Director for Public Health Knowledge and Research at Public Health Wales), Sarah Henry (Director of Methodology & Quality at the Office for National Statistics), Roger Halliday (Chief Executive of Research Data Scotland), and rounded off by the pre-eminent Professor David Hand (Imperial College and Chair of the National Statistician’s Expert User Advisory Committee).

These are experts with wide ranging experience that provided a rich background to answering our exam question: ‘In the light of concerns about survey response rates, use of personal data, and wider perceptions of the loss of trust in institutions, what can be done to manage risks to data quality?’

Iain gave us an important reminder of the importance of transparency with quality to build others’ confidence in the data and stats. He kicked off with reminding us that there are no perfect data and there never has been. The statistician’s job is to find out the pros and cons of data sources. Iain emphasised the importance of having better clarity for data that institutions actually hold and need. He said that statisticians need to take greater responsibility for data and transparency in the ways they work – they need to apply the fundamental principles of being open, admitting when things aren’t as high quality as they’d like.

For Sarah, data quality is close to her heart and she has a passion for surveys. She strongly defended their value and emphasised their continued importance. While there are a wide variety of types of sources (including many new ones such as from barcodes), the more traditional sources such as surveys are still essential. Understanding the sampling frame is very important and gives us confidence in the validity of the data, even when we have smaller samples. Analysts need to be able to quantify and quality assure the data – and they need more independent sources to achieve this and verify the data. Survey data can also help analysts better understand administrative data, as during the Covid pandemic. Sarah emphasised the benefits of survey data in establishing a stronger connection with the respondent. Doing so can help improve response if we better explain the purpose for the data collection. Getting a rich granular picture for the data can better inform decision making.

Roger focused on improving data quality and statistics through use and sharing. He emphasised that having an independent source is not enough to accurately verify quality and there can be high risks in decisions being made on poor quality data. Roger highlighted the benefits of analysts getting out of the office and actually visiting those providing data – he has found putting a face to the data is valuable. Using and combining sources can help address poor quality such as bringing together data from public bodies, and developing a plan of what can be done now, as well as in the medium and long term, is important.

David reminded us that how quality is viewed is centrally tied to purpose – what is good data for one purpose may be poor for another. Statistics are not a static output but changing, and it could be worth considering moving from being process oriented to being product oriented. David felt that there is a need to place more emphasis on local data and to connect to users. There are new opportunities with business productivity rising, and the use of new data types and sources that are encouraging new technologies. Adoption of these technologies such as AI and language models through ChatGPT, also influences what users can do and need —these change over time. A key question for users to address is what we are trying to do and what we want the data for.

Panel 2 recording: Data Ethics and AI

Data ethics and AI – an event summary

We heard about the challenges of deep fakes and scientific misinformation, as well as the seven deadly sins in the big and open data house. And, thankfully, we learnt some steps that can be taken to counter them including the benefits of the UK Statistics Authority’s data ethical principles that fit neatly with the Code of Practice for Statistics.

Our speakers were Areeq Chowdhury from the Royal Society, Sabina Leonelli from the University of Exeter, and Helen Boaden, chair of the National Statistician’s Data Ethics Advisory Committee.

Areeq talked through some of the current challenges with scientific misinformation and around AI, highlighting the societal harms, the need for honest and open discussion and support for fact checkers. He flagged the importance of challenging assumptions and holding platforms to account. The need to engage the public is not limited to times of emergency but should occur continuously. He highlighted that there can be an over correction, with a tendency for organisations to be cautious and over-apply data protection regulations. Other technical solutions are important too – using standardisation to support wider data use and establishing trusted research environments.

Areeq illustrated the challenges of generative AI and the creation of deep fakes. He highlighted some ways to mitigate the impact through establishing the digital content provenance through verification. The Royal Society is involved in a red team challenge of large language models to test the guard rails. Areeq also emphasised the importance of looking across disciplines to consider the AI safety risks and highlighted the difficulty for multilingual communities in receiving and understanding information. Watch the recording of the session to see Areeq’s own deep fake!

Sabina highlighted the rise of inductive reasoning and the logic of discovery as “data accumulation”. It suggests the more data the better to generate evidence and knowledge, with comprehensive data collection being a form of control. There is a wide appeal and perhaps mythology surrounding big data. Data are made, not given; they are partial and not comprehensive, and qualities do not always reduce to quantities.

Sabina described the seven deadly sins for big and open data houses: conservatism (the problem of old data), a house of cards (with unreliable data), convenience sampling (with partial data, selective and reinforcing inequalities in the digital divide), self-interest (the problem of dishonest and lack of regulation applying to the dissemination of data), environmental damage (unsustainability and pollution from storing masses of data), and global inequity (with the problem of unfair data). She emphasised the importance of debunking big data mythologies.

Helen suggested that there is a lot that can be learnt from applying data ethics principles both for researchers and for OSR in strengthening the Code. She introduced the principles from the Centre for Applied Data Ethics in the UK Statistics Authority that can be used by researchers considering the use of AI. She emphasised the benefit of using the principles to minimise the potential harms, as well as enabling researchers to efficiently analyse data. The principles also help in managing confidentiality and data security, to ensure the appropriate agreements are in place, promoting public engagement in the use of the research and identifying the benefits. They help underscore the importance of being transparent about the access and sharing of data. The principles can be applied and used to consider a huge range of ethical concerns in different ways and are frequently applied to novel research, including various elements of AI.

Helen emphasised that understanding the context around the data and research are important for effective ethical practice. It also relies on collaboration which can be international as well as national – AI goes beyond boundaries and there is a shared responsibility to use technology both ethically and appropriately.

Panel 3 recording: Changing user demands for data

Changing user demands for data – an event summary

“How can official statistics remain relevant in the face of changing user demands for data?”

Our speakers were Neil McIvor, Chief Data Officer for DfE; Dr Janet Bastiman, Chief Data Scientist at Napier AI and Professor Sir Ian Diamond, the National Statistician.

What are real-time data? Neil got us to think what is meant by this and identified three different type of data that are worth considering for official statistics. Actual real-time data are instantaneous, like Formula One cars where you need real-time information to make operational decisions in a split second. Then there are near real-time data such as overnight batch data from an operational system. Lastly Neil described a relate type, timely data, where the latency between the publication of a statistical product and the reference period that that product relates to is reduced.

Neil highlighted that as, a whole, real-time data are from operational systems where statistics is not their primary purpose. There is a need for timely statistics and analysis, but statisticians need to think about the whole end-to-end process, not just from the end-use perspective. Neil described the DfE data project during the pandemic to generate a real time national picture of school attendance where previously lagged data had been sufficient. They have been able to set up APIs to scrape attendance data from schools with zero burden on the schools and with an option to drop out if they prefer. DfE were able to work out what release frequency was needed and to understand how things were changing. It provided timely insights on the day of industrial action which helped the debate focus immediately on the issues.

Janet shared from her experience of working in a regulatory technology firm supporting financial institutions with their regulatory compliance. Financial crime is a huge issue across the world and monitoring for suspicious activity is essential – if a bank doesn’t detect the crime, it can be subject to financial penalties. The crimes affect everyone, and it is in everyone’s interests to address them. There are challenges though in definitions which can be vague and very qualitative that can make it hard to detect crime. And of course, criminals will take steps to mask their actions.

Real-time data needs are very different in the financial sector to the DfE example, being instant data while near real-time data are in small fractions of a second. They are looking at screening for red flags in near real time in a setting in which the statistics and analysis need to constantly evolve as criminal enterprises are motivated, funded and constantly trying to evade detection. They take a multi-faceted approach involving screening in near real time and applying rule-based statistics for sub-second checking of the data. They also use behavioural analytics and complex machine learning models to consider a wider range of data and transactions. Their approach gives the compliance teams increasing levels of detail at the right time so that they can investigate quickly verifiable issues straight away.

Sir Ian spoke about the importance of statistics serving the public good and that they are trusted. He emphasised that statistical independence comes from methodological rigor, ethical guidelines, and the demonstration that statistics are in the public interest. He reinforced the importance that statisticians publish their work and do so against agreed timetables. Sir Ian welcomed the work of DfE but highlighted the challenge in linking data from different departments, needed if you are trying to understand some of the complex social processes that impact on people’s lives. ONS had been able to successfully link and analyse data from patient information and mortality data during the pandemic, to show the impact of disadvantage in the disease patterns. Sir Ian stressed the importance of being able to overcome barriers to data linkage, to enable more insightful analysis like this.

Sir Ian noted the need for better real time and near real-time data, as well as more timely data – having these would overcome the need for revisions due to lagged data supply. There are particular difficulties in measuring migration within the UK and with other countries without the benefit of a population register to capture the moves. Work priorities within organisations can also make data sharing more challenging. Being clear about the use case with the public can help producers be more persuasive of the value of people sharing their data.

The session, chaired by RSS CEO Dr Sarah Cumbers brought together a range of invited stakeholders to speak briefly on some aspects of the Code of Practice for Statistics that they appreciate, as well as how they would like to see it evolve.

Ed Humpherson, the Director General for Regulation at the Office for Statistics Regulation, introduced OSR’s Code review – why it is being held and what OSR is looking to do since the second edition of the Code was published in 2018. He noted the substantive change in the world since that time and the growing need for and interest in using data, the greater integration of administrative data in developing methods and building on five-years’ worth of experience in applying it. OSR is open and keen to hear how others think the Code should evolve. This is the final event in a series run over the past few months and is specifically run to work with the RSS, as a key body for statistics in the UK, to hear from stakeholders.

Paul Allin, the RSS honorary officer for national statistics, spoke about embedding the perspective of public statistics in the Code. He highlighted how the Code is important for producers, the regulator, as well as for users, in setting the standard operating procedures for producers. He also emphasised the need for it to shape the culture of organisations responsible for official statistics and beyond. He highlighted the importance of replacing a focus on government needs to instead pivot towards public statistics. This starts from identifying the questions that need answering and being clear about the purpose. Key questions to consider are what statistics should be produced, whose needs are being met and how are the public being informed on things that matter to them, rather than on what matters to the government. Paul said that this requires sustained engagement with users and potential users. He would like to see Value as the heart of the Code and this will rely on engaging in cultural change that involves users as well as producers.

Dev Virdee, a member of the RSS National Statistics Advisory Group (NSAG) and Chair of thr Forum of Statistics User Groups, welcomed the review. He felt the pillars are important and a good framework. He appreciates the focus on supporting equality and inclusiveness. The Census is a good example of integrating user engagement in the process, across users from different faith groups and protected characteristics but Dev flagged that similar practice doesn’t happen across all statistics. There can be good engagement with institutions such as the Equality and Human Rights Commission but not necessarily with the citizen groups themselves. Dev asked how the Code can help in these areas to strengthen involving users. He also raised a question about UK official statistics involvement internationally, following withdrawal from the EU, and what is the future of international comparison – other countries look to the UK and we can learn from others such as through peer review with other nations.

David Caplan, a member of RSS NSAG and of the National Statisticians Advisory Committee on standards for economic statistics (NSEUAC), made a strong recommendation for strengthening accountability within the Code. He set out three main ways to do so: giving an account, be held to account, providing redress. This would mean producers providing full information about the statistics that are produced – the methods, judgements made, quality doubts and ideally measures of accuracy and reliability. Be held to account means having a full mechanism to engage meaningfully with the user community – a place where questions can be asked and responses given. The idea of redress means changing things which aren’t right, with producers responding appropriately to the feedback. Getting this right will give statistics of greater value, higher quality and in which users and wider society can have greater trust.

Anna Powell-Smith, founder and director of the Centre for Public Data, welcomed having a code – having the standards can improve data quality and be inspirational. Requiring the contact for the statistics lead is useful and unambiguous. She finds the Trustworthiness and Quality pillars punchy and clear but the Value pillar less so. In Value relevance and accessibility are the main areas that CPD encounters, as well as addressing data gaps. There are inconsistencies in access across the four countries in the UK. Anna’s main request was to define users formally in the Code, making it clear they are not just in government but also journalists, MPs, campaigners, citizens. She called for active, not passive, user research – she emphasised that it is not good enough for producers to wait for users to come to them, but that they should go out and do active research, interviewing people, meet them where they are to find their needs. Anna also wants there to be published work programmes that set out user needs.

Simon Briscoe, a consultant and journalist, set out seven areas he would like to see OSR address:

  1. Get critical official statistics, such as crime, homelessness, migration and RPI, that are not meeting the National Statistics standard to do so
  2. The cost/benefit of statistics should be clear – OSR should get producers to state what statistics cost to produce
  3. The Code should have user engagement as its first principle as in the first edition of the Code
  4. OSR needs to be proactive in looking for issues
  5. The default should be of open release, with departments not allowed to hoard data – if it is non-disclosive the data should be published
  6. OSR should deliver material changes through clear demands with specific deadlines in its reports
  7. OSR should focus its resources on the critical issues and not spread itself too thinly and cover less important data

Olly Bartrum, senior economist at the Institute for Government, gave his perspective of the Code from a position of wanting to see public debate happen on the basis of the merits of policy decisions rather than disputing the statistics. He likes that the Code emphasises the importance of statistics for informing decisions and sees it as the responsibility of governments to publish those statistics. It is important to look at how statistics are analysed and then used to produce evidence – but there is no rule on evidence being released. There is a lack of transparency around policy making and the evidence used to make decisions. Olly wondered about where the line is drawn on whether the Code is applied and felt that we need a culture to be created with statisticians, analysts and policy advisors who all work together to use statistics to generate evidence and then advice – there’s a need to look across the chain and how it links together to achieve the ambitions in the foreword to the Code.

Tony Dent, from Better Statistics CIC, emphasised the importance of understanding the value for money of statistical sources. He gave the example of the ONS Covid-19 infection survey which he said should have been subject to a comparison with administrative data, particularly hospital admissions. Tony highlighted the absence of good practice guides on the minimum reporting details on response rates for population surveys and on how potential bias resulting from uneven response should be investigated and resolved through the weighting used to estimate the population data. Better Statistics would like to see response rates calculated within each geographical area used in the sample design and to cover, as a minimum, sex, age, ethnicity, household size and some measure of deprivation, with mandatory reporting. The group also believe there is a need to strengthen the Code to reduce inadequately prepared and unclear statistics, to have greater oversight of modelled estimates, improve staff training in report writing – looking to reduce unnecessary complexity and improve understanding, and to review the regulatory regime given a lack of penalty for breaching the Code.

Alison Macfarlane, RSS Official Statistics Section and member of NSAG, highlighted challenges in the health statistics area. Given that health is a devolved matter there are complexities for users in using the statistics from the four nations, with perpetual reorganisations. There are also many NHS organisations and private health providers, including general practices. Alison noted that efforts had been made to bring together data, but they have led to widespread opposition. It was partly due to a lack of explanation and clarity over what data is gathered and shared and has led to mistrust. The result is less complete and less timely data for research.

Ed Humpherson concluded the event by summarising some of the key messages raised that had particularly struck him:

  • How to have an enforcement mechanism that is more overt
  • The need to bring out public statistics isn’t clearly set out at present but is closer to how we speak
  • The importance of transparency – ensuring the underlying numbers are released – it is an area OSR has had some success (Intelligent Transparency) with government departments is something we can build into the Code
  • Similarly with serving the public good
  • The importance of having quality metrics and explanations of bias, and
  • What do you do when there is a data gap

Ed then described OSR’s next steps – the call for evidence closes on 11 December 2023. OSR will review all the evidence gathered and publish a summary of what it has heard. It will then make a recommendation to the Authority Board on the way forward, aiming to announce what that is around March 2024.

Get in touch

If you would like to contact us regarding the review, please do email us at regulation@statistics.gov.uk