The success and potential evolution of the 5 Safes model of data access

In our latest blog Ed Humpherson, Director General for Regulation discusses the 5 Safes model as a key feature to support data sharing and linkage…

In OSR’s data linkage report , we highlighted the key features of the data landscape that support data sharing and linkage. The 5 Safes model is one of those. Yet we also recommended that the 5 Safes model is reviewed. In this blog, I want to focus on one aspect of the model and set out the case for a subtle but important change.

The 5 Safes model is an approach to data use that has been adopted widely across the UK research community, and has also been used internationally. It is well-known and well-supported and has had a significant impact on data governance. It is, in short, a huge success story. (And for a short history, and really interesting analysis, see this journal article by Felix Ritchie and Elizabeth Green).

The 5 Safes are:

Safe data: data is treated to protect any confidentiality concerns.
Safe projects: research projects are approved by data owners for the public good.
Safe people: researchers are trained and authorised to use data safely.
Safe settings: a SecureLab environment prevents unauthorised use.
Safe outputs: screened and approved outputs that are non-disclosive.

Any project that aims to use public sector administrative data for research purposes should be considered against the 5 Safes. The 5 Safes therefore is used to set a criteria-based framework for providing assurance about the appropriateness of a particular project.

OSR’s recommendations relevant to the 5 Safes:

In July 2023, OSR published our report on data sharing and linkage in government. We had a range of findings. I won’t spell them out here, but in short, we found a good deal of progress across Government, but some remaining barriers to data sharing and linkage. We argued that these barriers must be addressed to ensure that the good progress is maintained.

We made two recommendations relevant to the 5 Safes:

Recommendation 3: The Five Safes Framework Since the Five Safes Framework was developed twenty years ago, new technologies to share and link data have been introduced and data linkage of increased complexity is occurring. As the Five Safes Framework is so widely used across data access platforms, we recommend that the UK Statistics Authority review the framework to consider whether there are any elements or supporting material that could be usefully updated.
Recommendation 10: Broader use cases for data To support re-use of data where appropriate, those creating data sharing agreements should consider whether restricting data access to a specific use case is essential or whether researchers could be allowed to explore other beneficial use cases, aiming to broaden the use case were possible.

We made the recommendation about reviewing the framework because a range of stakeholders mentioned to us the potential for updating the 5 Safes model, in the light of an environment of ever-increasing data availability and ever-more powerful data processing and analysis tools.

And we made the recommendation about broader use cases because this was raised with us as an area of potential improvement.

The use of 5 Safes in research projects

What brings the two recommendations together is the 5 Safes idea of “safe projects”. This aspect of the model requires research projects to be approved by data owners (essentially, the organisations that collect and process the data) for the public good.

For many research activities, this project focus is absolutely ideal. It can identify how a project serves the public good, what benefits it is aiming to bring, and any risks it may entail. It will require the researcher to set out the variables in the data they wish to explore, and the relationships between those variables they want to test.

For some types of research, however, the strictures of focusing on a specific project can be limiting. For example, for a researcher who wants to establish a link between wealth and some aspects of health may not know in advance which of the variables in a wealth dataset, and which of the variables in a health data set, they wish to examine. Using the “safe project” framing, they might have to set out specific variables, only to discover that they are not the most relevant for their research. And then they might have to go back to the drawing board, seeking “safe project” approval for a different set of variables.

Our tentative suggestion is that a small change in focus might resolve these problems. If the approval processes focused on safe programmes, this would allow approval of a broad area of research – health and wealth data sets – without the painstaking need to renew applications for different variables within those datasets.

What I have set out here is, of course, very high level. It would need quite a lot of refinement.

Other expert views on the 5 Safes

Recognising this, I shared the idea with several people who’ve spent longer than me thinking about these issues. The points they made included:

Be careful about placing too much emphasis on the semantic difference between programmes and projects. What is a programme for one organisation or research group might be a project for another. More important is to establish clearly that broader research questions can be “safe”. Indeed, in the pandemic, projects on Covid analysis and on Local Spaces did go ahead with a broader-based question at their heart.
This approach could be enhanced if Data Owners and Controllers are proactive in setting out what they consider to be safe and unsafe uses of data. For example, they could publish any hard-line restrictions (“we won’t approve programmes unless they have the following criteria…”). Setting out hard lines might also help Data Owners and Controllers think about programmes of research rather than individual projects by focusing their attention on broader topics rather than specifics.
In addition, broadening the Safe Project criterion is not the only way to make it easier for researchers to develop their projects. Better meta data (which describe the characteristics of the data) and synthetic data (which create replicas of the data set) can also help researchers clarify their research focus without needing to go through the approvals process. There have already been some innovations in this area – for example, the Secure Research Service developed an exploratory route that allows researchers to access data before putting in a full research proposal – although it’s not clear to me how widely this option is taken up.
Another expert pointed out the importance of organisations that hold data being clear about what’s available. The MoJ Data First programme provides a good example of what can be achieved in this space – if you go to the Ministry of Justice: Data First – GOV.UK (www.gov.uk) you can see the data available in the Datasets section, including detailed information about what is in the data.
Professor Felix Ritchie of the University of West England, who has written extensively about data governance and the 5 safes, highlighted for me that he sees increasing “well-intentioned, but poorly thought-through” pressure to prescribe research as tightly as possible. His work for the ESRC Future Data Services project sees a shift away from micro-managed projects as highly beneficial – after all, under the current model “the time risk to a researcher of needing a project variation strongly incentivises them to maximise the data request”.

More broadly, the senior leaders who are driving the ONS’s Integrated Data Service pointed out that the 5 Safes should not be seen as separate minimum standards. To a large extent, they should be seen as a set of controls that work in combination – the image of a graphic equaliser to balance the sound quality in a sound system is often given. Any shift to Safe Programmes should be seen in this context – as part of a comprehensive approach to data governance.

Let us know your thoughts

In short, there seems to be scope for exploring this idea further. Indeed, when I floated this idea as part of my keynote speech at the ADR UK conference in November, I got – well, not quite a rapturous reception, but at least some positive feedback.

And even if it’s a small change, of just one word, it is nevertheless a significant step to amend such a well-known and effective framework. So I offer up this suggestion as a starter for debate, as opposed to a concrete proposal for consultation.

Let me know what you think by contacting DG.Regulation@Statistics.gov.uk.