I wouldn’t blame you if you were scratching your head at the outrage expressed last week that Excel was being used to record the information on COVID-19 test results in England. After all, it’s the most used spreadsheet tool in the world. It’s also a computer programme which, along with other proprietary software, has been used in public sector analysis for decades. The reason for all this concern is that it’s easy to make mistakes with Excel – like referencing the wrong cell in a calculation (we’ve all done it). And once you’ve made the mistake, it’s hard to find it. It’s not clear who has been using your spreadsheet and changed it (or, even worse, whether Excel has taken it upon itself to change it for you). This might not matter if your spreadsheet is for holiday planning or your personal budget (yes, we’re those kind of nerds). It definitely does matter when your spreadsheet is used by multiple people to produce and present official statistics, and what’s more – there is a better way.
Many statisticians and analysts are now starting to think differently and move away from off-the-shelf software with the aim of solving these problems. Within the Government Statistical Service this approach is known as a Reproducible Analytical Pipeline (also fondly referred to as RAP). It’s sometimes mis-characterised as simply automation, but it is so much more than that.
So…What is RAP?
RAP is a set of good practices and principles. RAP requires built-in checks and ensures a guaranteed audit trail of changes using version control software like git (which comes in handy if something goes wrong and you need to roll back a version!). It champions working in the open, through the publication and peer review of code on sharing and version control platforms such as GitHub. This allows collaboration, reuse of code by others and improves trust from users. RAP also enshrines good practice, such as well-commented and documented code, or appropriately stored and structured data. These good practices help prevent all sorts of issues from creeping in – like the flow of data being disrupted as a result of processes that are easily manually manipulated.
The end result is a higher quality, more transparent and more efficient process, allowing more time for statisticians to use their skills to add insight and value to their outputs.
RAP to the Future
At the Office for Statistics Regulation (OSR), we see the incredible progress that has been made by official statistics producers to RAP their work. But this progress appears in pockets and there is still a way to go to make sure that RAP is not only the default approach, but that all of its elements are applied. We know that barriers to RAP exist, whether it’s access to the right tools and training or the time and support to carry out the upfront work required. This is why at OSR we have launched a review to explore the use of RAP across government statistics in more detail. We want to better understand what enables successful implementation of RAP and what prevents people either implementing RAP fully or applying elements of it. If we understand these barriers then we can do more to help resolve them and ultimately the quality, trustworthiness and value of official statistics will improve.
Now, about that Excel spreadsheet…
COVID-19 has challenged statistics producers in a way that has never been seen before and they should be proud of the way they have risen to this challenge. Statistics were (and are being) produced from scratch and at record pace to inform both government and the public during these unprecedented times and this contribution should be celebrated. While the error with the Excel spreadsheet was not directly part of official statistics production there are still lessons we can learn from it and it highlights some important questions such as:
- what tools and support were available to producers when they needed it most?
- was RAP the approach taken to setting up this new work? If not, why not?
- and how can the good practices of RAP be effectively implemented when time is short and the pressure is high?
Although our review does not focus only on COVID-19 statistics, these are the sort of questions we want to explore in order to help statistics producers on their RAP journey. If you have experience with this, or any other RAP process, please contact us at Anna.Price@Statistics.gov.uk or Emily.Tew@statistics.gov.uk – we’d love to hear your views.
Because while we can’t fix the past, we should RAP the future.