Why Official Statistics producers are vital to administrative data research

Today, the Office for Statistics Regulation (OSR) published their report on ‘Unlocking the value of data through onward sharing’. As Director of a partnership that exists to do just that, I wholeheartedly welcome this new guidance. The report makes clear that the principles in the Code of Practice for Statistics – which ensures they are high quality and have public value – extend beyond statistics production to data sharing and access.

From our perspective, statistics producers are in an enviable position: if there is data of sufficient quality to support decision making, they will generally have access to it. Statistics producers will also have spent the time needed to understand the data’s quality issues, and how it should be curated to support research and analysis.

This means statisticians are exactly the people who should feel empowered to help facilitate administrative and survey data being made accessible to external researchers, through appropriate routes. As the report articulates, this includes the full spectrum from publication as open data, through to using secure research facilities such as those offered by ADR UK and other ESRC investments such as the UK Data Service.

Of the data standards elements presented, the two that are core to the vision of ADR UK are that data should be linkable and curated.

By linking administrative data sources, it is possible to reach across traditional departmental boundaries to more fully understand the impact of policies on society. As is becoming clear in the management of the Covid-19 pandemic, it is not enough to have good data about the health of the population if we don’t also understand other elements of people’s lives such as their caring responsibilities, job security, income, living conditions and ethnicity. All these factors interact to determine how different sectors of society will be affected, in terms of both health and other elements of wellbeing. It is only by linking this data and making it available to researchers, following the principles of the Five Safes, that we can properly understand the impact of the pandemic on society.

For data owning organisations to engage in the creation of new datasets for research, they need to know that the effort they put in is justifiable. This is why curated data is so important. Anyone who has been involved in setting up a new data sharing agreement knows that this is an understandably detailed and lengthy process. Knowing there is a commitment to continued curation of the data means the research value can be maximised, and the initial resource needed to make it accessible can reap rewards for years to come.

Moving forwards, we are aiming for datasets created as part of the ADR UK investment to be trackable. Public money is used to fund our programme, and we need to be able to show the public, data owning organisations and government the research their investment is facilitating. Building our published case study collection will not only make it easier for decision makers to find policy-relevant research, but will also help reassure data owning organisations that time invested in working with us is well spent.

At ESRC and across the ADR UK partnership, we are also excited about the potential for synthetic datasets to improve researchers’ ability to use data. These would help researchers develop their proposals, and could also play a vital role in training the next generation of researchers to use administrative datasets effectively. Like a flight simulator, they could enable rigorous and realistic training to be delivered, without requiring direct access to sensitive linked datasets which is rightly very tightly controlled.

ADR UK brings external researchers closer to policymakers, to support evidence-based policymaking. Statisticians and others involved in the production of Official Statistics are key to us forming this bridge, which is why the three Chief Statisticians from the devolved administrations, as well as a representative from the Office for National Statistics, all sit on our Leadership Committee. We look forward to supporting the OSR and statistics producers across government to deliver on the ambitions of this report.


This is a guest blog from Dr Emma Gordon, Director of the ADR UK (Administrative Data Research UK) programme at the Economic and Social Research Council (ESRC)

Joining Up Data

Jeni Tennison, CEO of the Open Data Institute, responds to our Joining Up Data for Better Statistics report.

Data is moving from being scarce and difficult to process to being abundant and easy to use. But harnessing its value for economic and social benefit – in ways that support innovation and deliver social justice – is not straightforward.

At the Open Data Institute (ODI), we would like to see a future where people, organisations and communities use data to make better decisions, more quickly. This would help our economies and societies to thrive. Using data and statistics well underpins research; enables us to innovate; informs the creation of more effective products, services and policies; and fuels discovery, economic growth and productivity.

In the future we would like to see, people can trust organisations to manage data ethically and benefits arising from data are distributed fairly. Data is used to meet the needs of individuals, communities and societies.

The Joining Up Data for Better Statistics review from the Office for Statistics Regulation (OSR) focuses on an essential part of this open, trustworthy data ecosystem: how to safely link together and share data from across different data stewards for analysis, research and generating statistics.

Data as roads

At the ODI, we often use the analogy of data being like roads. Where we use roads to navigate to a location, we use data to navigate to a decision.

The road analogy highlights the importance of joining up data. A single road only takes us to places between two locations; their real value comes from being part of a network. Data works in the same way: it is not just having more data that unlocks its value, but linking it together. Data is not individual datasets, it is a network: a data infrastructure.

We can apply the ‘data as roads’ analogy to the Code of Practice for Statistics’ three pillars:

  • Roads are valuable when they go to places people want to go to; similarly, data and statistics add value when they help answer society’s questions.
  • Well-paved roads help us travel more quickly, but even rough tracks can be useful if you have the right vehicle – you need to know what to expect when you’re planning a journey; similarly, high-quality data is best, but lower quality data can be useful if you are aware of its limitations when drawing conclusions.
  • To avoid danger, we rely on engineers to use good practices to build and maintain roads, bridges and tunnels and on road users obeying the rules of the road; similarly, we rely on data custodians and data users to collect, maintain, use and share data in trustworthy ways.

Open and trustworthy

Like our road infrastructure, for our data infrastructure to generate value it has to be both as open as possible and trustworthy.

Data is more useful when more people can access and use it. It is most useful when it can be joined together. Data that is inaccessible – or where access takes so long it is rendered irrelevant – is of limited utility.

At the same time, greater access and linkage – particularly with personal data – can increase the potential for harmful impacts. The result of unethical, inequitable and untransparent use of data goes beyond direct impacts on affected individuals: it can undermine trust more widely, causing people to withdraw consent.

This ultimately affects the quality and representativeness of the data we have, the data we need to understand our populations, to meet their needs, and to innovate.

As the OSR’s review highlights, there is still much to do to increase both data’s openness and its trustworthiness. We need better technical guidance and approaches, through data trusts perhaps, but we also need to upskill data stewards so they can understand and weigh risks and benefits, quickly and well.

We are still learning how to share and join up data in open and trustworthy ways. Being open and transparent about the decisions we make as we use and share data can build trust and speed up this learning, so we can all benefit from data.

Joining Up Data for Better Statistics

To speak to people involved in linking Government datasets is to enter a world that at times seems so ludicrous as to be Kafkaesque. Stories abound of Departments putting up arcane barriers to sharing their data with other parts of Government; of a request from one public sector body being treated as a Freedom of Information request by another; and of researchers who have to wait so long to get access to data that their research funding runs out before they can even start work.

Our report, Joining Up Data for Better Statistics, published today, was informed by these experiences and more.

The tragedy is that it doesn’t have to be this way. We encountered excellent cases where data are shared to provide new and powerful insights – for example, on where to put defibrillators to save most lives; how to target energy efficiency programmes to reduce fuel poverty; which university courses lead to higher earnings after graduation. These sorts of insight are only possible through joining up data from different sources. The examples show the value that comes from linking up data sets.

This points to a gap between what’s possible in terms of valuable insights, especially now the Digital Economy Act creates new legal gateways for sharing and linking data, and the patchy results on the ground.

It leads us to conclude that value is being squandered because data linkage is too hard and too rare.

We want to turn this on its head, and make data linkage much less frustrating. We point to six outcomes that we see as essential to support high quality linkage and analysis, with robust safeguards to maintain privacy, carried out by trustworthy organisations including the Office for National Statistics (ONS) and government Departments. The six outcomes are that:

  • Government demonstrates its trustworthiness to share and link data through robust data safeguarding and clear public communication
  • Data sharing and linkage help to answer society’s important questions
  • Data sharing decisions are ethical, timely, proportionate and transparent
  • Project proposal assessments are robust, efficient and transparent
  • Data are documented adequately, quality assessed and continuously improved
  • Analysts have the skills and resources needed to carry out high-quality data linkage and analysis

The report seeks to make things better. The six outcomes are the underpinnings of this. The report supports them with recommendations designed to help foster this new, better environment for trustworthy data linkage. The good news is that there is a strong coalition of organisations and leaders wanting to take this forward both inside and outside Government. This includes the National Statistician and his team at ONS, strong data linkage networks in Scotland, Wales and Northern Ireland, and new bodies like the Centre for Data Ethics and Innovation, UK Research and Innovation and the Ada Lovelace Institute. Alongside this blog we’re publishing a blog from Jeni Tennison, CEO of the Open Data Institute, which shows the strong support for this agenda outside Government.

We want statistical experts in Government, and those who lead their organisations, to achieve the six outcomes. When they do so, they will ensure that opportunities are no longer squandered. And the brilliant and valuable examples we highlight will no longer be the exception: analysts will be empowered to see data linkage as a core part of their toolkit for delivering insights.

Code consultation: the story so far…

I am very pleased to say that we received more than 100 formal and informal responses to our Code of Practice consultation. These have come in via our formal consultation questionnaire, as well as through comments in emails. I and my colleagues have also gained valuable insight through many conversations and discussions in our road-trip of seminars and meetings with many statistics producers and user groups. I was encouraged to see just how many of you attended these events and it was really good to hear your views – thank you!

We now are working our way through the detail of the feedback and beginning to compile our consultation response. We have a lot of work to do. Your feedback has proven a rich source of constructive criticism and ideas for further improving the Code. Everyone involved in this work here in the Office very much feels the strength of a common resolve to produce a Code that can support public confidence in data and statistics.

Here is an outline of the responses we have received.

  • We had lots of comments about the Trustworthiness, Quality and Value (TQV) framework. Some supported it as being clear and supporting the public value of statistics for citizen; others felt that it needed to be clarified, extended or amended. Our overall view is that respondents see the framework as helpful, particularly in giving a succinct overview of what producers of statistics should be aiming for; but that there is a lot of subtlety we need to bring out in how the framework is explained and applied.
  • We received interest from a range of organisations in voluntary compliance but we were also told that there is a need for a clearer explanation of how the Code should be applied beyond Official Statistics by producers and us.
  • The Code was largely found to be clear but with some areas of repetition in the sections where we targeted difference audiences (relating to our wider advocacy of the Code to non-official statistics producers, both inside and outside government).
  • The Trustworthiness pillar can be further refined in relation to orderly release, independence, data governance and consent. Respondents asked for clarification on Pre-Release Access.
  • The Quality pillar can be further refined in relation to improving its applicability to all data types, including emerging ways of obtaining data, and to better reflect the nature of the coherence principle as a cross-cutting topic.
  • The Value pillar can be enhanced by more fully capturing the dimension of timeliness; better reflecting value for money and public benefit in relation to statistics production; as well as, clarifying the Code’s relevance to different ways of publishing data and statistics.
  • The data diagnostic tool was generally welcomed and thought to be useful by expert users and statistics producers.
  • There was strong support in the role of the Authority as the independent regulator, as well as an advocate of good practice across all publishers of data and statistics. There was a request for clarity over the ways in which it can challenge misuse of statistics and poor practice.

Thank you again for sharing your thoughts about the Code – keep an eye on our blog for a further update about how we plan to address some of these comments. We hope to publish our consultation response in November.

Ethnicity facts and figures

Credible statistics that command trust are an essential public asset and the lifeblood of democratic debate. Statistics should be used to provide a window on our society and the economy. The value of data lies in its ability to help all in our society – from members of the public to businesses, charities, civil servants and Ministers – understand important issues and answer key questions.

The launch this week by the Cabinet Office of the Ethnicity facts and figures website is, in this context, a substantial achievement.

The website provides data from across Government departments on how outcomes from public services vary for people of different ethnicities. Some of this data has previously been published and some not.  The website highlights many disparities in outcomes and treatment from public services.  Specialists in particular areas – such as health, housing, and criminal justice – may have been aware of some of the data but few will be familiar with all of it.

What makes the Ethnicity facts and figures website so valuable is that it draws together detailed information from across government, and presents it accessibly, neutrally and dispassionately, onto a website – and all the data can be downloaded. What’s striking is that the website isn’t that flashy in its use of visualisations and other data tools. It presents the data, and describes them clearly and succinctly. Doing this, it provides a clear picture for visitors to the website.

This reflects the huge effort put into asking people what they want from the website – including members of the public, academics, central and local government, NGOs and open data experts.

This really is a model for how all statistics should be developed: find out what questions people across society want to answer, and figure out how best to present the data to them. It shows how Government departments could do much more to publish data with the public users in mind – rather than simply publishing data in the way they always have done. Focusing on the public users opens up the opportunity for innovative ways of presenting statistics.

So in my view this website is already starting to add value. But it’s also clearly still under development – there’ll be more data added to the website, and other refinements as the website responds to the ways people are using it. And no doubt it will open new avenues for research and policy intervention: the website makes information available for the public to ask the question ‘why?’. That’s the first step to understanding.

And there’s one further thing to celebrate. Alongside the Ethnicity facts and figures website, the Cabinet Office has published a Statement of Compliance with the Code of Practice for Statistics.

Though the website draws on official statistics, it is not itself an official statistics publication (though it could be in the future) – for example it didn’t follow the standard approach to publication that we expect of official statistics. Here the Statement of Compliance is really helpful as an exercise in transparency. It’s clear on the judgements and process that have gone in to developing the website and recognises that it doesn’t follow the Code’s publication protocols.

And the Statement draws strength from the draft Code’s three pillars – trustworthiness, quality and value – and explains how the work has been done using the pillars as a framework.

This is in effect the first example of what we call voluntary compliance – using the Code not as a statutory obligation but as a best practice guide.

On this voluntary approach, as in much else, the Ethnicity facts and figures website is an exemplar.

Health statistics

In the last few weeks, we’ve made three comments on health statistics – one in England, about leaks of accident and emergency data; one in Scotland, on statistics on delayed discharges; and one on analysis at the UK level. They all show the importance of improving the public value of statistics.

On accident and emergency statistics, I wrote to the heads of key NHS bodies in England to express concern about recent leaks of data on performance.

Leaks of management information are the antithesis of what the Office for Statistics Regulation stands for: public confidence in trustworthy, high quality and high value information.

It’s really hard to be confident about the quality of leaked information because it almost always lacks context, description, or any guidance to users. On value, leaked information usually relates to a question of public interest, but it’s not in itself valuable, in the sense it’s not clear how it relates to other information on the same topic. Its separated, isolated nature undermines its value. And it’s hard for leaked information to demonstrate that it is trustworthy, because the anonymous nature of the “producer” of the information (the person who leaked it) means that motives can be ambiguous.

But leaks can highlight areas where there is concern about the public availability of information. And that was the constructive point of my letter: the NHS bodies could look into reducing the risk of leaks. One way of doing this would be to reduce the time lag between the collection of the information on accident and emergency performance, and its publication as official statistics. This lag is currently around 6 weeks – 6 weeks during which the performance information circulates around the health system but is not available publicly. Shorten this lag, I argue, and the risk of disorderly release of information may also reduce.

The comments on Scotland relate to the comparability of statistics across the UK. When NHS Scotland’s Information Services Division published its statistics on delayed discharge from NHS hospitals for February, the Cabinet Secretary for Health and Sport in the Scottish Government noted that these figures compared positively to the equivalent statistics in England.

This is of course an entirely reasonable thing for an elected representative to do – to comment on comparative performance. The problem was that the information ISD provided to users in their publication on how to interpret the Scottish statistics in the UK context was missing – it wasn’t clear that Scotland figures are compiled on a different basis to the England figures. So the comparison is not on a like for like basis. The difference wasn’t stated alongside the equivalent statistics for England either. This clarification has now been provided by ISD, and NHS England have agreed to make clearer the differences between the figures in their own publication.

For us, it’s really important that there is better comparability of statistics across the UK. While there are differences in health policy that will lead to different metrics and areas of focus, it’s quite clear that there is public interest in looking at some issues – like delayed discharge – across the four UK health systems.

In this situation, good statistics should help people make sound comparisons. Yet, with health and care being a devolved matter, there are some constraints on the comparability of statistics across England, Wales, Scotland, and Northern Ireland.  And, to the untrained eye it is difficult for users to know what is or is not comparable – with delayed discharge data as a prime example. This is why we really welcome the recently published comparative work, led by Scottish Government, where statisticians have created a much more accessible picture of health care quality across the UK, pulling together data on acute care, avoidable hospital admissions, patient safety, and life expectancy/healthy life expectancy across all 4 UK countries.

Both these cases – the leaks and comparability – illustrate a broader point.

Health statistics in the UK should be much better. They should be more valuable; more coherent; in some cases more timely; and more comparable. If statistics do not allow society to get a clear picture in good time of what is going on, then they are failing to provide public value.

Migration statistics

A key aim of the Office for Statistics Regulation is to be more systemic. We want to focus not on individual sets of statistics in isolation, but to look at how they are used alongside other datasets. This reflects our ambition to be champions of relevant statistics in a changing world.

Statistics on migration is an area that is ripe for this approach. We have been looking at migration statistics from several angles, reviewing  different aspects of migration. We have assessed the National Insurance numbers for adult overseas nationals statistics produced by the Department for Work and Pensions. We are reviewing the ONS’s estimates of student migration in the International Passenger Survey. And we are also looking at the way in which the ONS’s Labour Force Survey estimates the number of non-UK participants in the UK labour market. We will publish the results of these reviews over the coming months, starting next Thursday (26th January) with our assessment of National Insurance numbers for adult overseas nationals.

But when we produce this work we will also reiterate a broader point. There are a range of migration-related datasets available across different Government departments, including those from HMRC, DWP, ONS and the Home Office. The key to a comprehensive picture lies in bringing these datasets together. We will therefore emphasise the crucial role that John Pullinger plays as National Statistician in ensuring that there is a joined-up approach across Government.

This will build on the letter I wrote to John last March emphasising the importance of a comprehensive, coherent picture of migration. As I said then “it is particularly important that the different sets of data are brought together in a coherent way, fully quality assured and published in an orderly manner, to paint as full a picture as possible of the patterns of migration”. In 2017, we will continue to encourage a coherent approach to one of the most important areas of statistics in contemporary public debate.