A model’s journey to Trustworthiness, Quality and Value

OSR’s Head of Data and Methods, Emily Barrington explores the work taken to deploy artificial intelligence and statistical modelling within government while adhering to a high regulatory standard

Thinking back to when I joined the Office for Statistics Regulation (OSR) in late 2019 (just before the pandemic hit), Artificial Intelligence (AI) and other complex statistical modelling was still in its infancy within government. There were pockets of work being done here and there and guidance was being produced but there was nothing public facing, and nothing to help analysts understand how to organise model development that could help instil trust from the public’s perspective.

At this point you may be thinking, but aren’t you the regulator of statistics? Why are you thinking about AI models? Well, two reasons. Firstly, it comes down to definition. AI is the new buzzword but when you strip it back to its core components it’s really just complex statistical modelling (albeit on a larger scale and with bigger computers!) so any guidance that would apply to statistical modelling will also apply to AI and vice versa. Secondly, helping build public trust is in our ethos and, when it comes to AI use within government, the outputs of such models often have a public impact – be it directly or indirectly through policy change.

Not long after I joined I started looking at how we, at OSR, could have a voice in this area to champion best practice through our pillars of Trustworthiness, Quality and Value (TQV).

The pandemic effect

If anything, the need for data and insight throughout the pandemic helped break some of the barriers that had been stopping AI/complex modelling taking off within government. Things like data sharing and public acceptance of use has generally been greater during the pandemic which may have been driven by the need to help save lives. This drive, however, sometimes led to misjudgement and this is what happened when awarding exam grades in 2020 and led to our review on ‘Securing public confidence in algorithms’. This was the first time OSR had worked on anything related to algorithms so specifically and the lessons that were drawn from the work resonated well, people thought we had something to give – and we agree with them!

This work also made us think outside the box when it came to the Code of Practice for Statistics (The Code). After all, the model used to award exam results was not official statistics, neither was it AI for that matter, but the Code still helped us when making our judgements.

Back to championing best practice

By the time the review on awarding exam results was published, we had already started putting down some thoughts on how the code could be applied when using models and later that year our alpha version of ‘Guidance for Models: Trustworthiness, Quality and Value’ was published. It was published as alpha because we wanted to get as much feedback as possible before promoting more widely – this was our first time in this space after all. We also felt there might be a better way to present the messages but needed some further thought and input from the wider analysis and data science communities.

The pillars of Trustworthiness, Quality and Value (TQV)

Since the publication of the alpha guidance, we have come a long way in thinking about what the Code and its pillars really embody when broken down and have matured our thinking on statistical modelling. Today we published our finalised version of ‘Guidance for models: Trustworthiness, Quality and Value’ which takes the TQV messages and brings them to life for model planning and development. We have softened our focus on the Code principles since the alpha version and taken a step back to concentrate on the most important Code considerations for public good of models. This came from feedback from analytical and data science communities that the messages are stronger when not linked to the Code specifically. We have also incorporated all the lessons from our review on Securing public confidence in algorithms’ and our follow-up case study on QCOVID.

We now have a guidance which we believe embodies what is needed to help build public confidence and trust when deploying statistical models. But I guess the proof is in the pudding…


If you have any feedback, thoughts or use cases where you found our guidance helpful please do not hesitate to contact OSR Data and Methods – we’d love to hear from you!

Related links

Guidance for Models: Trustworthiness, Quality and Value

Automation and Technology: Getting the full picture

When you think about the Office for Statistics Regulation (OSR) you may initially think of us as a group of people who make sure that statistics are being used correctly, a ‘statistics watchdog’ of sorts. If you’re a producer of statistics, you might think about our Code of Practice, National Statistics Designation or the breadth of regulatory work that we do.  You might not think of the complementary work programmes we have alongside them to help deliver this regulatory function.

One of those work programmes is the Automation and Technology (A and T) work programme which looks at how we can automate some of the work we do at OSR to allow regulators more time to engage with the people they need to engage with. I was recruited in October last year as the Head of the A and T work programme and since then a lot has been happening that I’d like to share.

‘Automation’ is typically defined as a machine doing repetitive tasks without much human involvement and that makes it perfect for OSR’s horizon scanning and casework identification.

Horizon scanning is where our regulators look at what’s happening across the board for statistics within their topic area and casework includes looking into the potential misuse of statistics. If you think about where most of that information comes from, you’ll think of the web or social media platforms and that information can automatically be gathered using a social media scraper.

The first project the work programme started was to automate a statistical release calendar which would inform us of upcoming releases, added or removed publications and any changes made to release dates. It takes its information from the gov.uk research and statistics page but the aim is to incorporate all statistical release calendars to get a full picture across all official statistics. It has proved most useful during the COVID-19 pandemic due to the volume of new statistics for us to keep track of.

Although being titled ‘Automation and Technology’, the work programme actually encompasses quite a bit of Data Science and Data Visualisation type work too. After data has been gathered from the web, data mining techniques are needed to structure the data into something usable. After that, meaning or insight needs to be extracted and a good way to do that is to use Natural Language Processing (NLP) which is the discipline within Data Science that deals with the analysis of text data.

Finally, the output of that analysis needs to be communicated to the user and love them or hate them, dashboards are a great way to visualise the output and keep everything in one easy access place for the user. One of my favourite data visualisation tools for Python, and particularly for creating interactive dashboards, is plotly’s Dash . Not only does it have lots of functionality, it’s not quite as tricky to code as other tools such as D3 and it integrates really nicely with cloud platforms such as Google Cloud Platform for deployment.

During the COVID-19 pandemic we’ve been busier than ever responding to concerns of misleading statistics and pulling together to produce Rapid Reviews of new or changing statistics. One of the ways the A and T work programme has been facilitating that is by creating a twitter dashboard which encompasses all of the above techniques to allow us to see what is being talked about around COVID-19 statistics. It runs every day and collects the tweets related to a provided search term and then mines the tweets to provide the top retweeted tweets, top hashtags, popular links and other useful metrics. The code is open source and can be found on our Github Page.


So, what’s in store for the future of A and T in OSR? Well, we have lots lined up in terms of coding projects. One such project will be looking at the impact we have with our interventions which will help inform the best way to intervene in future. We have also recently released a statement which describes work we are looking to carry out into what we need to do as an organisation to regulate the growing use of Artificial Intelligence technologies within official statistics.

If you have any comments on our planned work or anything relating to the A and T work programme at OSR, then please feel free to email me at Emily.tew@statistics.gov.uk