Concluding remarks
39. Overall, the DPM lays solid foundations for producing population estimates for England and Wales based on administrative data. The key advantage of the DPM over the current cohort-component-based approach is that the drift related to updating data from the census and increasing uncertainty can be reduced and more timely estimates can be produced. This, however, depends almost entirely on the data inputs to the model and the assessment and understanding of their quality. This assessment is crucial because the relative differences in quality between data sources can influence the model-based ABPEs and their quality in terms of uncertainty or potential biases.
40. The key aspect of reliable ABPEs based on the DPM and administrative data is understanding and measuring the under- and overcoverage of the various data sources and an implementation of a long-term strategy of providing a high-quality benchmark for the administrative data. One of such benchmarks is census, but it is available only for 2011 and 2021. Various other options, such as coverage surveys, addresses register and population register, have been proposed and studied by the ONS. In the case of a lack of a reliable benchmark that can be used to adjust data inputs or correct for data inadequacies in the model, there is a risk that the admin-based ABPEs will still suffer from an error, which may have a similar nature to the intercensal drift, for example, in a situation when coverage adjustments are based on the 2021 Census and are extrapolated for the future releases of the ABPEs.
41. In my opinion, more research is also needed to better understand the nature of errors (biases and uncertainty) in the data sources that are used as inputs to the DPM. This should be done in consultation with a variety of stakeholders and through their engagement with the model development (such as through a demonstrative R package). The key stakeholder are local authorities, amongst which there are large differences in how admin data capture populations. Such investment may help in the DPM development and ensure that biases and uncertainty in the ABPEs can be reduced.
42. The differentiation between bias and accuracy that can be present in the data can be built into the established ONS procedures in data quality assessment, such as those based on the European Statistical Systems and QQI. This would be in line with the recommendations developed regarding the theoretical quality standards for the future population estimates in terms of bias and variance (ONS 2023c). An approach similar to the one in ONS (2023c) of considering bias and accuracy separately and in each of the data inputs to the DPM, could be developed. In fact, this differentiation is being made, for example, by studying overcoverage and undercoverage (i.e. bias) of the SPD (Law et al. 2022; 2023) and, in another study, developing methods for generating measures of uncertainty for the SPD (ONS 27/07/2020). However, such analyses seem to be unrelated to each other, whereas their outcomes potentially constitute key inputs to the DPM. An assessment of errors could include analysing the data generation process in terms of what may cause a systematic error (e.g. through under- or overcoverage of sub-populations, persons systematically not interacting/delaying interactions with an admin system, admin systems being focused on documents rather than persons, etc), and what can lead to a non-systematic error (e.g. where mis-classifying a person as a resident or non-resident is equally likely, mis-classification in linkage). An example of such a framework is a Total Survey Error framework (e.g. Groves and Lyberg 2010), which, while not directly applicable, provides an overview of how errors in admin data could be described. I appreciate that situations where a distinction if a given exclusion rule or data collection mechanism generates bias or accuracy is not possible and, thus, I recommend that further research is carried out to better understand those mechanisms following good practices already established at the ONS through, for example, simulation studies and clerical checks of samples of data (cf. Law et al. 2023).
43. The model is generally well-described from a technical point of view (bearing in mind that it is still in development) and there is evidence that it has gone through testing or is being tested as is a norm in the Bayesian inference workflow. Various aspects, such as those pointed out in this report, or issues related to the current ONS approach to publishing documentation, such as results from subsequent updated models not being directly compared or comparable, minor inconsistencies in reporting, varying levels of technicalities, could be better presented in a more comprehensive way, with a clear structure related to the data inputs, modelling framework, technical assumptions, computational methods, model testing and analysis of the outputs, all in one document, dashboard or a website. For instance, subsequent versions of the DPM could be released in a similar fashion as the SPDs, which are assigned a version. All aspects of the above are already available but may require updating or are in preparation. A resulting coherent documentation will be crucial for the ABPE/DPM project to be sustainable in the future.
Back to top