Data for Policy: Junk-Food Diet or Technological Frontier?

First posted on the blog for Data & Policy: a peer-reviewed, open access journal at the interface of data science and governance published by Cambridge University Press. Read more about how you can contribute to Data & Policy.

The pandemic has brought lots of change to our lives and to the economy. It has brought new things to the fore, but just as much it has helped speed-up already apparent trends — like the challenges for high street retailers, which predated the pandemic but have probably been accelerated by it.

This is also true of data and policy. It has helped speed-up trends towards a greater role for data in policymaking.

It’s been clear for a while that policymaking can benefit a great deal from better use of data and digital skills. The rationale for the Data for Policy conference — a forum for the impact of the digital revolution on government that is taking place virtually this year on 15–17 September — illustrates this:

The inception of Data for Policy, in 2015, was as a result of the observation, by academics in the field, that the rapidly emerging and developing digital technologies can be utilised to inform and direct public policy. The central idea was to bring together technologists and those involved in public policy, and thereby facilitate an interdisciplinary forum of discussion.[1]

The pandemic seems to have accelerated this process. The role of scientific expertise — gathering, modelling and interpreting data — has been to the fore in almost all countries that have confronted the pandemic. The key data — cases, tests, deaths — have been published widely, and scrutinised intensively by the media and by citizens: making armchair epidemiologists of us all (as I have written in a recent blog: the armchair epidemiologists).

And the impacts of the pandemic have been tracked using new sources of data, using tools of data access and scraping. In June 2020, Andy Haldane, chief economist at the Bank of England, celebrated the use of near-real time data to understand the economic impact of the pandemic. He described it as a shift in the technological frontier of measuring the economy:

The emergence of a new suite of fast indicators, including from the UK Office for National Statistics, has significantly shifted the technological frontier when monitoring the economy.[2]

But amidst this wave of techno-optimism, there is caution. Some have been uncertain about some aspects of the new speed. In the world of science, articles are being published which do not always have the full peer review they would normally get. More generally, UK Government claims that they are merely “following the science” have been queried. For example, Venki Ramakrishnan, President of the Royal Society, blogged that:

there is often no such thing as following “the” science. Reasonable scientists can disagree on important points…[3]

The dewy-eyed positivity with which “science” and “data” are heralded can overlook downsides. This was pointed out recently by Chris Giles. Writing in the Financial Times, he sounded a wary note about new sources of data on the economy that promise real-time insight. They were, he said, like fast food: tempting but bad for you.[4] Perhaps this goes too far — surely newer, faster data sources have their place. But what his article rightly conveys is a scepticism about claims that new sources of data are some panacea, some perfect new technology, that will suddenly help society make perfectly wise economic decisions.

This scepticism about claims for new sources of data begs questions: How can we harness the power of data for policy while avoiding the sort of issues that lead to scepticism? How do we avoid them being misleading? How can we speed up the use of data to inform policy without turning them into a junk diet?

At the Office for Statistics Regulation, thinking about these questions is our day job. We set the standards for Government statistics and data through our Code of Practice for Statistics. And we review how Government departments are living up to these standards when they publish data and statistics. We routinely look at Government statistics are used in public debate.

Based on this, I would propose four factors that ensure that new data sources and tools serve the public good. They do so when:

When data quality is properly tested and understood:

As my colleague Penny Babb wrote recently in a blog: “‘Don’t trust the data. If you’ve found something interesting, something has probably gone wrong!”. People who work routinely with data develop a sort of innate scepticism, which Penny’s blog captures neatly. Understanding the limitations of both the data, and the inferences you make about the data, are the starting point for any appropriate role for data and policy. Accepting results and insights from new data at face value is a mistake. Much better to test the quality, explore the risks of mistakes, and only then to share findings and conclusions.

When the risks of misleadingness are considered:

At OSR, we have an approach to misleadingness that focuses on whether a misuse of data might lead a listener to a wrong conclusion. In fact, by “wrong” we don’t mean in some absolute sense of objective truth; more that if they received the data presented in a different and more faithful way, they would change their mind. Here’s a really simple example: someone might hear that, of two neighbouring countries, one has a much lower fatality rate, when comparing deaths to positive tests for Covid-19. Likely belief when someone hears this: this country is doing something really well to treat people with Covid-19. But then the information is re-presented: this country conducts tests repeatedly across most of the population on a weekly basis, so it is doing far more tests of healthy people than the other country. Revised belief: Perhaps that country’s low ratio of deaths to tests is not because it’s treating people well, it’s because it’s doing a lot more tests.

Our work constantly involves considerations like this: how are the data being interpreted? What are the confounding factors? Our advice is to always think about how the presentation of data could be misleading.

When the data fill gaps:

Data gaps come in several forms. One gap, highlighted by the interest in real-time economic indicators, is timing. Economic statistics don’t really tell us what’s going on right now. Figures like GDP, trade and inflation tells us about some point in the (admittedly quite) recent past. This is the attraction of the real-time economic indicators, which the Bank of England have drawn on in their decisions during the pandemic. They give policymakers a much more real-time feel by filling in this timing gap.

Other gaps are not about time but about coverage. Take self employment. It’s long been an issue that the UK’s statistics on income and earnings struggle to give good insights into the position of the self employed, as this blog by my colleague Elise Baseley highlights. The self employed are not part of the standard measures based on payroll earnings. During the pandemic, many self employed people have not been able to work, and it’s important that their position is well understood. Indeed, the support schemes created by the UK Government provide the opportunity to get new and better insight into this data gap.

Another gap in coverage during the pandemic, at least at the outset, concerned care homes in the UK. There were good data releases on the impact of Covid-19 in hospitals, but much less on care homes. Through new data collections and analysis, this gap has steadily closed.

When the data are available:

Perhaps the most important thing for data and policy is to democratise the notion of who the data are for. Data (and policy itself) are not just for decision-making elites. They are a tool to help people make sense of their world, what is going on in their community, helping frame and guide the choices they make.

For this reason, I often instinctively recoil at narratives of data that focus on the usefulness of data to decision-makers. Of course, we are all decision-makers of one kind or another, and data can help us all. But I always suspect that the “data for decision-makers” narrative harbours an assumption that decisions are made by senior, central, expert people, who make decisions on behalf of society; people who are, in the words of the musical Hamilton, in the room where it happens. It’s this implication that I find uncomfortable.

That’s why, during the pandemic, our work at the Office for Statistics Regulation has repeatedly argued that data should be made available. We have published a statement that any management information referred to by a decision maker should be published clearly and openly. We call this equality of access.

We fight for equality of access. We have secured the publication of lots of data — on positive Covid-19 cases in England’s Local Authorities, on Covid-19 in prisons, on antibody testing in Scotland and several others.

Data and policy are a powerful mix. They offer huge benefits to society in terms of defining, understanding and solving problems, and thereby in improving lives. We should be pleased that the coming together of data and policy is being sped-up by the pandemic.

But to secure these benefits, we need to focus on four things: quality, misleadingness, gaps, and public availability.

This is how we avoid data for policy becoming a junk-food diet.

[1] https://dataforpolicy.org/about/ — retrieved 24 July 2020

[2] Andy Haldane, The Second Quarter, speech on 30 June 2020, https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/the-second-quarter-speech-by-andy-haldane.pdf?la=en&hash=3B82F9C046B7BCDA160AE8BE558B1EB58CFF21EB, — retrieved 24 July 2020

[3] Venki Ramakrishnan, President of the Royal Society, Following the Science, blog on 18 May 2020 — retrieved 24 July 2020

[4] Chris Giles, Fast economic data is like fast food — tempting but bad for you, July 9 2020, https://www.ft.com/content/366653da-fc7b-4f3d-bf2f-ef95dfc18041 — retrieved 24 July 2020