Automation and Technology: Getting the full picture

When you think about the Office for Statistics Regulation (OSR) you may initially think of us as a group of people who make sure that statistics are being used correctly, a ‘statistics watchdog’ of sorts. If you’re a producer of statistics, you might think about our Code of Practice, National Statistics Designation or the breadth of regulatory work that we do.  You might not think of the complementary work programmes we have alongside them to help deliver this regulatory function.

One of those work programmes is the Automation and Technology (A and T) work programme which looks at how we can automate some of the work we do at OSR to allow regulators more time to engage with the people they need to engage with. I was recruited in October last year as the Head of the A and T work programme and since then a lot has been happening that I’d like to share.

‘Automation’ is typically defined as a machine doing repetitive tasks without much human involvement and that makes it perfect for OSR’s horizon scanning and casework identification.

Horizon scanning is where our regulators look at what’s happening across the board for statistics within their topic area and casework includes looking into the potential misuse of statistics. If you think about where most of that information comes from, you’ll think of the web or social media platforms and that information can automatically be gathered using a social media scraper.

The first project the work programme started was to automate a statistical release calendar which would inform us of upcoming releases, added or removed publications and any changes made to release dates. It takes its information from the gov.uk research and statistics page but the aim is to incorporate all statistical release calendars to get a full picture across all official statistics. It has proved most useful during the COVID-19 pandemic due to the volume of new statistics for us to keep track of.

Although being titled ‘Automation and Technology’, the work programme actually encompasses quite a bit of Data Science and Data Visualisation type work too. After data has been gathered from the web, data mining techniques are needed to structure the data into something usable. After that, meaning or insight needs to be extracted and a good way to do that is to use Natural Language Processing (NLP) which is the discipline within Data Science that deals with the analysis of text data.

Finally, the output of that analysis needs to be communicated to the user and love them or hate them, dashboards are a great way to visualise the output and keep everything in one easy access place for the user. One of my favourite data visualisation tools for Python, and particularly for creating interactive dashboards, is plotly’s Dash . Not only does it have lots of functionality, it’s not quite as tricky to code as other tools such as D3 and it integrates really nicely with cloud platforms such as Google Cloud Platform for deployment.

During the COVID-19 pandemic we’ve been busier than ever responding to concerns of misleading statistics and pulling together to produce Rapid Reviews of new or changing statistics. One of the ways the A and T work programme has been facilitating that is by creating a twitter dashboard which encompasses all of the above techniques to allow us to see what is being talked about around COVID-19 statistics. It runs every day and collects the tweets related to a provided search term and then mines the tweets to provide the top retweeted tweets, top hashtags, popular links and other useful metrics. The code is open source and can be found on our Github Page.

Future

So, what’s in store for the future of A and T in OSR? Well, we have lots lined up in terms of coding projects. One such project will be looking at the impact we have with our interventions which will help inform the best way to intervene in future. We have also recently released a statement which describes work we are looking to carry out into what we need to do as an organisation to regulate the growing use of Artificial Intelligence technologies within official statistics.

If you have any comments on our planned work or anything relating to the A and T work programme at OSR, then please feel free to email me at Emily.tew@statistics.gov.uk