Annex A: Method

Research questions / purpose

The over-arching aim of the Office for Statistics Regulation’s work was to understand how the data sharing and linkage landscape across government has changed since our previous work in this area. Specifically, we wanted to know what progress has been made since our last reports, and what barriers to effective data sharing and linkage still exist. We also wanted to identify examples of good practice and understand the enablers for good work that is taking place.


We spoke to a range of stakeholders from government, the wider public sector, and from the private sector representing a range of roles and responsibilities in relation to data sharing and linkage. These included statisticians and analysts working on projects using linked datasets, those actively involved in linking data, those responsible for managing and running trusted research environments (TREs), and those who facilitate data sharing and linkage, for example by funding projects.

To ensure we captured a broad range of views we spoke to both those directly involved in projects and senior leaders/those with strategic oversight of projects and programmes.

We identified participants proactively ourselves and through advertising the project on our website and asking interested parties to contact us. After the initial interviews, we also identified further participants via “snowballing”, where those we had already spoken to recommended further key individuals or teams that it would be beneficial for us to engage with. We sought to engage with stakeholders from each of the four UK countries to ensure views from each administration were captured, as well as those involved in data linkage projects at a regional or local level.

We concluded our interviews when we felt that we had reached the point of data saturation, i.e. we were no longer discovering new information in our analyses of the data. At this point we continued to invite feedback and contributions via email.

To ensure representation across a wide range of views, we grouped stakeholders into four categories:

  • Those using linked datasets
  • Those involved in linking data
  • Those providing data to be linked, but not necessarily carrying out the linkage work themselves
  • Those involved in data sharing and linkage, but not necessarily doing the work directly themselves

In practice, not all stakeholders fitted neatly into just one of the four categories but having this breakdown allowed us to ensure we spoke to a broad range of people across the whole spectrum of data sharing and linkage. A list of the organisations and teams we engaged with is available in Annex B.


We carried out semi-structured interviews to explore stakeholders’ experiences and perceptions of data sharing and linkage. Semi-structured interviews involve open-ended discussion with participants, guided by a pre-determined discussion plan.

We devised an interview schedule containing questions for the four different stakeholder types outlined above. The questions were broadly similar across the four categories, but the wording was altered where appropriate to make them relevant to each type of stakeholder and their specific situation. The questions focussed broadly on what work was being done in relation to data access and linkage, enablers and barriers, and visions for the future. As well as higher level questions, we also devised some prompt / follow-up questions for use where necessary.

The interviews were generally carried out virtually using Microsoft Teams. Two members of the team were present for each interview, with one member leading the conversation and the other taking notes. The interview schedule was used as a guide but generally the conversations took an open format and were allowed to flow to capture the unique experiences of each of the individuals and teams we spoke to. Interviews usually lasted in the region of forty-five minutes to one hour.


We used a thematic analysis approach to identify and develop common themes in the interview data. Initially we started with the broad headings of barriers and facilitators; we then identified further themes and sub-themes and coded the data in line with these themes. We did not pre-determine any categories other than barriers and facilitators and allowed the themes to emerge from the data itself.

We started the analysis whilst still carrying out our interviews, both so that we could refine the interview questions if necessary and so that we could identify any gaps and attempt to address these through recruiting new participants. The notes from the interviews were divided out among the team who then worked jointly to identify the emerging themes. We then shared our findings with key stakeholders to ensure that they presented an accurate picture of what we had been told.

We developed scenarios and personas based on the themes emerging from the analyses. These are hypothetical situations and characters that depict how things might be in relation to data sharing and linkage in five years time. For each of the main barriers and facilitators we identified, we discussed what the outcomes might be depending on progress in these areas. We then created scenarios based on these possible outcomes and the interplay between them. The scenarios depict a range of outcomes from one where progress has been made in all areas to one where data linkage and sharing has been deprioritised and there has been a lack of any real progress at all.

Back to top
Download PDF version (494.05 KB)