“Wouldn’t it be cool if…

…we could look at this against x! And y. And maybe a, b and c too…”

This felt like quite a common conversation with my team, back when I was analysing data in the Department for Digital, Culture, Media and Sport (DCMS) circa 2015.

The number of interesting questions and analyses we could do with our data, if we could only put it together with other data, felt potentially limitless. And what an amazing benefit these analyses could have to society – we’d basically be able to understand and improve everything!

But it wasn’t meant to be. We did try and match our survey data with data held by one other department and… it was painful! It took months to get to the point of being able to physically share and receive data and, once we had some data, getting it ready to analyse proved tricky too. In fact, it proved so difficult that, I’m ashamed to admit, I moved roles before I managed it.

OSR also continues to emphasise the power of linked data to produce better statistics. On paper, linking data sets might sound simple but, in practice, it is often difficult. This is why I’m so excited about the recent work we’ve seen from the Ministry of Justice (MoJ). MoJ is taking great steps to link up the administrative data sets it generates in its operational work, and to make them available for analysis by people outside of the department. This means that MoJ, and other interested parties, can more easily do analysis across different parts of the justice system, and beyond, to understand the journeys individuals take.

There are two projects I’d like to highlight:

         1. Data First

In collaboration with ADR UK (Administrative Data Research UK), MoJ is undertaking an ambitious data linkage project called ‘Data First’. OSR’s 2018 review of The Public Value of Justice Statistics highlighted the need for statistics that move from counting people as they interact with specific parts of the justice system to telling stories about the journeys people take. Data First is doing just that! It will anonymously link data from across the family, civil and criminal courts in England and Wales, enabling research on how the justice system is used and enhancing the evidence base to understand ‘what works’ to help tackle social and justice policy issues.

In June, we were delighted to hear that Data First reached its first major milestone. The first, research-ready dataset – a de-identified, case-level dataset on magistrates’ court use – was made available for accredited researchers through the Office for National Statistics (ONS) Secure Research Service (SRS). This data provides insight into the magistrates’ court user population, including the nature and extent of repeat users. It enables, for the first time, researchers to establish whether a defendant has entered the courts on more than one occasion and will drive better policy decisions to reduce frequent use of the courts. In August, a second output followed, this time a de-identified, research-ready dataset on Crown Court use. This dataset is also available through the SRS.

         2. Data shares with the Department for Education (DfE)

To improve understanding of the potential links between individual’s educational outcomes and characteristics and their involvement or risk of involvement with crime and the criminal justice system, MoJ and DfE have created a de-identified, individual-level dataset, which links data from the Police National Computer (MoJ) and the National Pupil Database (DfE)[1]. The DfE data spans educational attainment, absence from school, exclusions and characteristics like special educational needs and free school meals eligibility. The MoJ data includes information on criminal histories and reoffending, court proceedings, prison and assessments of offenders. Linking this data will allow analysis that has previously not been possible, including: longitudinal analysis of trends in individual’s characteristics and outcomes; analysis to inform the design of policies and processes that better support those at risk; and evaluations of the effectiveness of interventions. Accredited researchers can apply to access the data via the ONS SRS or MoJ’s Justice MicroData Lab.

This work follows The Children in Family Justice Data Share (CFJDS)[2], which started in 2012 and has resulted in a database of child-level data linked from across the MoJ, DfE and the Children and Family Court Advisory and Support Service (Cafcass). The CFJDS provides, for the first time, longitudinal data on the short and medium-term outcomes for children who experience the family justice system. The data are being used to build understanding of how different experiences and decisions made within the family court can impact on children’s educational outcomes, and subsequently, their life chances. In turn, they will provide more robust evidence on which to make policy decisions for children and their families.

What’s really exciting about both these projects is the way that the teams involved are tackling the challenges of data linkage. Instead of creating a big new IT system to try and join up the data, these projects are starting from a position of, “let’s take what’s in the current databases and see what we can get through anonymised matching.” The exact tools used vary between teams and departments but include established tools such as SAS Data Management Studio and SQL Server Management Studio (SSMS), which were used by MoJ and DfE respectively for linking crime and justice and NPD data. For data linkage done as part of Data First, MoJ have developed a new tool called Splink, which was written in the programming language Python. Splink is an open source library for probabilistic record linkage at scale: it’s free, and MoJ hope others in government (and beyond) will find it useful for their own data linkage and deduplication tasks. Rule based matching algorithms, including ‘fuzzy-matching’ algorithms – rules used to link data based on non-perfect matches between data variables – have been used to link individuals within and between data sets.

These projects show what can be achieved when government departments, agencies and external organisations work together, and will help us start to achieve what my team and I hoped we could back in 2015. They will enable us to better understand individuals and society and, in turn, to make better decisions and policies, which will improve the justice system and outcomes for all individuals. I’m looking forward to seeing what comes next.


[1] To ensure the confidentiality and protection of data about children, access to DfE data extracts from the NPD is managed through tightly controlled processes.

[2] https://www.gov.uk/government/statistics/family-court-statistics-quarterly-october-to-december-2017, published 29 March 2018

Having a better public debate about crime

Pat MacLeod, lead regulator for crime and justice statistics in the Office for Statistics Regulation, writes about why it’s not easy to say what’s happening to crime and the Office of National Statistics’ (ONS) efforts to improve the public debate.


Nearly everyone is interested in hearing about crime. Questions that the public might ask like ‘is crime going up?’ or ‘is there more violent crime now?’ sound deceptively simple. Yet answering questions on the amount of crime and how it is changing is not easy. It is ONS’s job to do this in England and Wales.

Crime is the combination of individual acts that are defined as against the law, the make-up of which changes over time. Some things commonly described as crimes, like knife crime are, in fact, a collection of legally defined crimes such as homicide, robbery or assault that involve a knife. And by its nature, crime covers lots of things that are secretive or hidden. So, as well as being hard to define, crime is also difficult to measure.

To keep it simple – and I’ve simplified what follows a lot – let’s look at how ONS measures the sorts of crime experienced by the general adult population. Things that we might think of conventional crime like theft, assault and vandalism. The Crime Survey for England and Wales is as close as you can get to measuring the amount of crime that the general adult population experience and how many people experience crimes in those countries. It tells us about crimes that the police know about as well as the ones that have never come to the notice of the police. It is not so good at telling us about crimes that don’t happen very often across the adult population, like robbery.

Statistics on crimes that the police record are the other main way ONS measures crime in England and Wales. These statistics count the number of crimes police are aware of and have officially recorded. Naturally, they don’t include crimes the police aren’t aware of. Just now, more people are telling police forces about some crimes like domestic abuse and sexual offences. The police don’t correctly record every incident they should as a crime, although there is some evidence they are getting better at doing this. All of this means that, when the police record more crime, it might indicate increasing demand or improving processes, but it doesn’t automatically follow that crime has increased. It needs careful investigation before that link can be made.

For a long time, from the early 1990s, it looked like conventional crime was going down in England and Wales. In 2014, when the crime survey still showed crime going down, the numbers of crimes recorded by the police started to increase. This was mostly due to police forces in England and Wales getting better at recording the crimes they were made aware of.

Hearing this, you might be forgiven for thinking that it would be best not to rely on crimes recorded by the police if you want to find out what is happening to crime in England and Wales. But that’s not the whole story. The crime survey has a time lag which makes it hard to spot when things start to change. Despite the limitations of statistics on crimes recorded by the police, recent increases in crimes that the police record well, like those involving a knife, gave an early indication that these were increasing.

So, it’s not easy to answer those deceptively simple questions about crime. What we can confidently say, though, is that interest in hearing about crime will continue. That’s why we will continue to support ONS’s efforts to ensure the public is properly informed by statistics about crime, and continue to speak out whenever we see that the public debate is not well served.

Late last year we wrote to ONS encouraging them to look at more ways to improve the value of ONS’s crime statistics to the public debate. Since then I’m pleased to say that we have seen steady improvement in the way ONS reports what is happening to crime and I think their last two publications – for year ending June 2018 and year ending March 2018 – are the clearest yet. I especially like their focus on how particular crimes are changing and on how it is mercifully rare for most people to be a victim of most crime. It is an ongoing challenge for ONS – and there are always exceptions – but I would say that their approach is starting to create the conditions for a better informed public debate.