The exams algorithm story is about more than just exams

“Dreams ruined by an algorithm”. This was the headline on the BBC Northern Ireland website on 13 August. It summarised one of the main stories of the summer in all four nations of the UK. The headline reflected the human and emotional impact of the 2020 exams results: with some students getting lower grades than they were expecting or felt that they deserved.

But the story is about more than using statistical models to award grades. The resulting negative backlash, specifically on the role that algorithms played, threatens to undermine public confidence in statistical models more broadly.

We are concerned about this. That’s why we’ve launched a review of the way the statistical models were built and overseen. It is not our intention to apportion blame or to launch criticisms with the benefit of hindsight. Instead, we want to focus on the future: how to implement models in a way that is consistent with public confidence.

Government statisticians have performed well during the pandemic. They have responded quickly to identify, develop and produce new data and statistics – statistics which are currently an integral part of our lives. The ONS, and its counterparts in Scotland and Northern Ireland, have provided clear analysis of the human tragedy of mortality. And, as OSR’s rapid reviews have shown, statisticians have produced new data on sectors as diverse as the economy, transport, school attendance, and people’s engagement with green space.

Over the last six months, these and other statistics have served the public’s appetite for clear, trustworthy information on the unfolding covid-19 pandemic. There is clear evidence that they are meeting a real need and serving the public good.

Our overarching role at OSR is to ensure that statistics serve the public good, and we do this by setting a Code of Practice that Government must follow. The Code has public confidence in statistics at its heart. However, confidence in statistics risks being undermined by the poor public reception that the exam algorithms received in all four countries of the UK.

From the perspective of statistics serving the public good, the exams story looks worrying. Not only has there been widespread criticism of the statistical models because of the perceived unfairness in the results they created; but there is a risk that future deployment of statistical techniques may be held back by the chilling effect of the poor publicity. That would be a real setback: it would limit statistical innovation and mean that the public sector cannot use new approaches to providing services to the public.

The exams issue has shone a light on how statistical models are used to make decisions. We want to learn from this experience. We want to explain how statistical models can be deployed and overseen. We want to set out some basic expectations for how these models can serve the public good. And we want to show how the principles of the Code of Practice – trustworthiness, quality and value – are highly relevant to situations like this, where complex models are used to support decisions that have impacts on individual human beings.

In short, the review aims to identify lessons for public bodies considering the use of statistical models to support decisions.

Our overall aim is simply stated. The use of statistical models to support decisions is likely to increase in coming years. We want to show how government can learn from this experience – and make sure that these statistical models serve the public good.

 

A robot by any name?

My big problem with my favourite innovation of the year – the Reproducible Analytical Pipeline (RAP) – is this: what should I call it for my end of year blog? The full name is a mouthful, yet its acronym doesn’t give much of a clue to what it does.

I wanted to name it Stat-bot. I imagine a cute little droid about 3 feet tall, soppy and warm-hearted, buzzing around the Government Statistical Service dispensing help and advice wherever humans need it. But the Office for Statistical Regulation due diligence department* reviewed the blog and pointed out the existence of a commercial product with this name (also, I’m probably overly influenced by my children, who find anything including the word ‘bot’ automatically hilarious). I therefore edited this blog and used a variety of alternative imaginary names for the product.

I heard about, er, Auto-stat from Steve Ellerd-Elliot, the excellent Head of Profession for Statistics at the Ministry of Justice (MOJ). He was describing their new approach to producing statistical releases and their associated commentary using the Reproducible Analytical Pipeline (RAP).

This new approach, developed in partnership with the Government Digital Service, involves automating the process of creating some of the narrative, the highlights, the graphs and so on. It’s based on algorithms that work the basic data up into a statistical release. To find out more about how RAP works, read the Data in Government blog and this follow-up post. And to be clear, it’s not just Steve and his MoJ team that are using this approach – it was developed in the Department for Digital, Culture, Media & Sport and has been picked up by the Department for Education, amongst others. The Information Services Division in Scotland have developed a similar tool.

Like the statistical R2D2 of my imagination, this approach helps human statisticians, and in two really important ways. Firstly, Stat-O reduces the potential for human error – transposition and drafting mistakes and so on. But more significantly, robostat (?) frees up a massive amount of time for higher level input by statisticians – the kind of quality assurance that spots anomalous features in the data, narrative that links up to other data and topics, and adds human interest to the automated release.

The other thing about … statomatic? … is that it is just the most eye-catching of a broader range of innovations Steve and his colleagues at MoJ have brought to statistics in recent months. They include:

  • a new portal for prisons data, with embedded data visualisations that radically extends what the existing gov.uk web platform can host;
  • an associated suite of Justice data visualisation tools that are freely available to users: and
  • new developments within the Justice Data Lab to allow a wider range of analysis, with the pilot of a Justice MicroData Lab to open up access to the data.

When we launched the Office for Statistics Regulation we aimed to stand up for the public value of statistics. To set the standards for producing and publishing statistics; to challenge when these standards are not met, and celebrate when they are. I hope we’ve balanced challenge and celebration in a sensible way through the year and through our Annual Review.

But it’s often the way of things that the challenge attracts the most attention. So I think it’s appropriate for me to make my final blog of 2017, in what is after all a season of celebration, something of a toast to — er — I mean, a toast to — um– oh well, a toast to RAP.

 

*the due diligence department doesn’t actually exist; it was a colleague in the pub who told me about the commercial product.