Improvement and innovation

Transformation and administrative data

Last year’s report highlighted several transformation programmes designed to improve the quality, efficiency and relevance of statistics. We noted it is important to understand and mitigate any risks emerging during the transition to new methods and data sources. This year, we have seen even more transformation activity, underscoring the need for producers to be alive to the need for risk management.

Transformation programmes can help to ensure that statistics use modern methods and data sources, often allowing enhanced statistics to be delivered more efficiently. The use of administrative data is a key enabler of many of the transformation projects being delivered across the system, as data sharing and the capabilities to unlock these data become more widespread. Administrative data are collected for operational purposes, usually as part of the delivery of a service, with statistical use being a secondary purpose. These data are often created when people interact with public services, such as schools, the NHS, the courts or law enforcement agencies and the benefits system.

Administrative data are usually generated and maintained by public bodies in delivering front-line services. Many of these bodies have also faced financial pressures. Producers have told us these pressures can affect the quality of data. It is important that producers understand quality impacts and, where they can, mitigate data burden on public bodies.

So that statistics on population better reflect changes in society and technology and meet user needs, ONS is developing its methods for estimating the size and makeup of the population using a new dynamic population model (DPM). The DPM uses a statistical modelling approach to draw from a range of data sources, including administrative and survey data. ONS is now producing admin-based population estimates for England and Wales using the DPM.

Currently, the ONS admin-based population estimates are classed as official statistics in development – this highlights to users that they are statistics undergoing development and testing. We are working closely with ONS to determine whether the statistics will meet the professional standards set out in the Code for accreditation.

The DPM work is a significant development for the UK statistical system. The DPM production and development work is complex and challenging, and the teams behind it are working hard to deliver ambitious goals. ONS’s international migration statistics, which are in part an input into the DPM, are also going through an ambitious transformation programme. ONS has moved away from the International Passenger Survey as its main data source and is working towards an approach based on administrative data sources. We have been advising ONS as it develops these statistics, and in our most recent review we found that ONS has made progress towards meeting user needs, with significant improvements made to the statistics.

With different approaches to population statistics across the nations of the UK, it is important that the resulting statistics be joined-up and coherent across the UK. ONS should continue to work collaboratively with producers in Northern Ireland, Scotland and Wales.

The DPM work is happening alongside ONS’s transformation of its Labour Force Survey. ONS has been transforming its Labour Force Survey (TLFS) for Great Britain with the aim of switching to this as its main source of labour market data. The TLFS is an enhanced version of the existing Labour Force Survey (LFS) and is an online-first survey with a ‘knock-to-nudge’ approach, which involves field interviewers knocking on a respondents’ door to encourage a response if they have not previously answered the survey. As well as increasing the achieved sample, these steps are aimed at making sure that the people who participate in the survey better represent the entire population. We are advising, challenging and supporting ONS as it develops this new approach.

Other notable transformation work includes that by NISRA on improving labour market statistics through survey transformation and increasing the use of administrative data and other data sources. NISRA intends to move to a new online-first Northern Ireland Labour Market Survey (NI LMS) in autumn 2024.

Scottish Government is taking a system and profession wide approach to the transformation and improvement of statistics by publishing its strategic priorities for official statistics in Scotland. The priorities are designed at a high level under the broad pillars of users, efficiency, data and people. Statisticians in Scottish Government are empowered to take forward improvement work under these pillars as they see fit. This is supported centrally by a range of activity: regular communications and sessions on different ways of thinking about the use of data and applying the Code of Practice, a leadership programme for statisticians focused on improvement and peer community groups to tackle cross cutting statistical issues.

Comparability and coherence

Issues relating to comparability (where different statistics can be compared over time and by geographic region or topic area) and coherence (where related statistical outputs explain the topic they cover in a consistent way when used together) have continued to be important as areas where statistic producers have invested resources to bring about improvements.

For each of the nations of the UK, there are different policy contexts reflecting the devolution settlement. This means the delivery of services in areas such as health and education differs across governments, and therefore, operational data and the focus of data collected for policy evaluation vary across the four nations. It can be challenging for the system to reconcile these differences and produce comparable statistics.

Producers should explain how their statistics do, and do not, compare with statistics for other parts of the UK. The blog post by the Chief Statistician of Wales on Comparing NHS performance statistics across the UK serves as a good example. Where it is not feasible to produce UK-comparable outputs, producers should support users by signposting other related statistics and clearly explaining what is and what is not comparable across the UK, as well as the different methodologies used.

There are a number of examples of good practice in health and social care statistics. Work led by ONS with partners across the UK (including government departments, health departments and health bodies) is making it easier to understand the comparability of health data for England, Wales, Scotland and Northern Ireland. This includes ongoing work to improve the cross-UK comparability of Accident and Emergency wait time statistics. Other examples of recent work to bring together comparable statistics from across the UK include statistics on fuel poverty and homelessness.

The ONS Local and Coherence division, devolved governments and relevant departments have been working together to create new UK-wide data in high-priority areas of shared interest across the UK. These have included new data on public transport availability and house building.

Our recent review of the quality of the police recorded crime statistics for England and Wales highlighted the crime trends explainer published by the ONS as good practice in explaining the coherence of related sets of statistics. The explainer sets out the different ways that crime is measured, determines which measure is best for different crime types and discusses some of the trends that have emerged.

The comparability of UK census outputs and population statistics will be a particular focus this coming year. The ongoing development of the admin-based population estimates and the pace of delivery of what is a new method of estimating the population in England and Wales will inevitably create implications for the comparability of population estimates across the UK.

It is important that the system, co-ordinated by ONS, build in user needs for UK-wide coherence from the outset, working in partnership across all parts of the UK. A key factor will be ensuring that all parts of the system are adequately resourced for addressing coherence issues.

UK comparability issues were explored in detail in both the Lievesley Independent review of the UK Statistics Authority and the PACAC Transforming the UK’s Evidence Base report, with the PACAC report highlighting the detriment to individual citizens in areas when it is impossible to compare the experiences of those living in each of the four nations of the UK. The committee has recommended that we conduct a review on the adequacy of comparable UK-wide data.

Data sharing and linkage

While we continue to see examples of data sharing and linkage being used to enable analysis of key societal issues, significant progress is still needed in overcoming many of the remaining, often longstanding, barriers to data sharing and linkage.

In July 2023, we published our Data Sharing and Linkage for the Public Good report. We found that the COVID-19 pandemic provided a particularly strong impetus to share data for the public good. But, despite the value of sharing and linking data being widely recognised, there remain areas of significant challenge, including uncertainties about the public’s attitude to, and confidence in, data sharing and the culture and process in government. The report made recommendations for overcoming barriers to data sharing and linkage for the public good under the themes of public engagement and social license, people, processes and technical challenges. We highlighted our concern that unless significant changes are implemented, the progress made could be lost. Since our report was published, we have been engaging with the organisations that are key to delivering our recommendations, and we published an updated report and recommendations in July 2024.

While there are pockets of innovative and ambitious work happening, overall little progress in actioning the recommendations in our 2023 report has been made across the statistical system and government. Our update report, published in July 2024, sets out the challenges that current processes pose to effective and efficient data sharing and the need for greater leadership from government.

Despite these challenges, ambitious examples of data sharing have provided powerful insight. For example, the Ministry of Justice’s Better Outcomes through Linked Data (BOLD) programme aims to improve the connectedness of government data in England and Wales. The programme was created to demonstrate how people with complex needs can be better supported by linking and improving the government data held on them in a safe and secure way. As well as improving understanding of participants’ concerns about their data being used, BOLD has led to analytical outputs such as that published by the National Confidential Inquiry into Suicide and Safety in Mental Health based at the University of Manchester to investigate the factors associated with suicide by people accessing drug and alcohol treatment services.

Administrative Data Research Wales has gone from a proof of concept for data linkage to using linked data in focused ways to provide specific insights into its Programme for Government. Examples where this has influenced decision-making include Linking Flying Start scheme attendance data to education Foundation Phase baseline on-entry assessments, Linking Care & Repair home advice and modification interventions data with care home admissions data and Linking school workforce census data to COVID-19 vaccination records.

In Northern Ireland, NISRA statisticians in the Department for the Economy, the Department of Education and in the Administrative Data Research Northern Ireland (ADR NI) team are working together to develop a Longitudinal Education Outcomes database for Northern Ireland (LEO NI), which will provide insights into labour market trajectories for people with different educational backgrounds.

These examples, and others like them, demonstrate the richness and complexity of analysis that can be undertaken when cultural and technical barriers are overcome. The ONS’s Integrated Data Service (IDS) and consultation on the future of population and migration statistics in England and Wales make a compelling case for what could be achieved by sharing and linking data.

Data sharing was a key theme of the Lievesley Independent Review of the UK Statistics Authority. Mirroring our findings, the review concluded that the UKSA’s efficacy is hampered by the systemic and cultural barriers to responsible data sharing between government departments. We support the review’s call for the centre of government to take a lead role in addressing these challenges. Data sharing was further explored in the House of Commons PACAC committee report, Transforming the UK’s Evidence Base, which recommended that the Cabinet Office, in partnership with ONS, develop a comprehensive programme aimed at improving data-sharing for statistical and research purposes.

AI opportunities and challenges

The rise of user-friendly artificial intelligence (AI) and ‘large language models’ (LLMs) such as ChatGPT has focused attention on the potential uses of AI in the public sector.

While we have not encountered examples of AI use in official statistics production in our regulatory work, ONS and other government departments are conducting feasibility studies to test possible uses. Such studies include those testing how AI can be used to improve the searchability of statistics on websites, produce non-technical summaries, recode occupational classification based on job descriptions and tasks and automatically generate code to replace legacy statistical methods.

The Code of Practice for Statistics provides a framework for ensuring the quality and trustworthiness of statistics produced with AI. Our Guidance for Models explains how the pillars in the Code can help in designing, developing and using statistical models. However, we are aware that there will likely be a need for more specific guidance, from ourselves and others, in the future.

At present, the risks around AI, as viewed through the lens of the Code pillars of trustworthiness, quality and value, include:

  • Trustworthiness risks – there are concerns that malicious external agents might use AI to undermine public trust in statistics and government. Attempts to do so could include promoting misinformation campaigns to cause confusion around political issues, with targeted advertising and AI-generated blog posts, articles, and video and audio content.
  • Quality risks – quality concerns centre largely on AI models’ accuracy, potential biases introduced via model training data and transparency issues. For example, the Government Digital Service, while testing its newly developed chatbot GOV.UK Chat, experienced issues with hallucinations (incorrect information presented as fact) and accuracy that were unacceptable for public sector work, though these issues have informed more development work. Furthermore, the ‘black box’ nature of AI models makes it difficult for producers to be completely transparent about how statistical outputs are produced. The statistical system will need to find acceptable solutions to these challenges.
  • Value – concerns around the trustworthiness and quality of AI-generated statistical outputs and communications are impacting their perceived value to both organisations and the public. The latest iteration of the Public Attitudes to Data and AI Survey suggests that public sentiment towards AI remains largely negative, despite its perceived impact being reported as neutral to positive.

Guidance and strategic leadership on AI across government is evolving at pace. The AI Safety Institute has been established to focus on advanced AI safety for the public interest, and the recently published Generative AI Framework for HMG provides guidance on using generative AI (a form of AI that can interpret and generate high-quality outputs including text and images) safely and securely for civil servants and people working in government organisations.

Given the opportunities and risks of AI, as a regulator our current focus is on two main areas: the use and regulation of AI systems, such as large language models (LLMs), in the production and communication of official statistics and our role in responding to the use of AI to generate misinformation.

Our recommendations

We see many innovative transformation programmes, but in some cases there is a lack of join-up between these programmes, which limits the learning and sharing that can take place across the system and is fundamental to successful innovation. We would like to see the GSS share knowledge, best practice and expert support to maximise the benefits of transformation programmes.

 

We are working on additional Code-related guidance which will set out more clearly what we consider transformation work to be, and how we regulate and support statistics producers who are working in this way. In this guidance, we will set out a need to see clearer plans and governance around these transformations to support user confidence in the programmes.

 

Producers should make it clear to users how their statistics compare across different geographies. If there is significant demand for direct comparisons that is not addressed by existing statistics, producers should work together to produce additional analysis.

 

We want to see the system working with partners to implement the recommendations set out in our Data Sharing and Linkage for the Public Good report. We will continue to advocate for, direct and advance data sharing and linkage across government through our regulatory and systemic work. OSR’s work on data sharing and linkage will also feed into our ongoing work on updating the Code of Practice for Statistics and how we set out our expectations in this area for producers.

 

With developments in artificial intelligence, it is vital that the statistical system be equipped to maximise the opportunities and address the challenges that will arise in this area. The statistical system, including us as its regulator, should show strong leadership in the AI era by setting relevant standards based on the Code. This includes being transparent about the real and perceived risks of using AI in official statistics and how these are addressed, to build public confidence.

Back to top
Download PDF version (376.97 KB)