The assessment of the Office for National Statistics’ Admin Based Population Estimates: Independent expertise

Published:
15 July 2024
Last updated:
15 July 2024

Summary of recommendations

The Author

As part of our assessment and to inform our judgements around the suitability and quality assurance of the data and methods used in the Dynamic Population Model, we commissioned this independent review from Professor Arkadiusz Wiśniowski, University of Manchester

Arkadiusz Wiśniowski, The University of Manchester

Email: a.wisniowski@manchester.ac.uk

Date of report: 02 July 2024

Recommendations I consider essential for the ONS to address

R1. To provide a comprehensive and detailed methods guide that will ensure that the Dynamic Population Model (DPM) is reproducible. The guide should describe in detail:

    • data inputs,
    • modelling framework,
    • assumptions regarding population components,
    • computational methods,
    • model testing, and
    • analysis of the outputs.

The methods guide should contain versioning similar to the versioning of the Statistical Population Dataset (SPD).

R2. To provide in the documentation (R1) a clear differentiation between bias and accuracy (or precision) of the data inputs and assess each data input in terms of bias and accuracy. The assessment should inform the DPM. Such a distinction is essential for the DPM to produce reliable (i.e. unbiased and accurate) population estimates.

R3. To quantify in the documentation (R1) the assumptions in the model, e.g., for precision this could be done by providing coefficients of variation around the mean, rather than stating that one source is more precise than the other. The current version of the DPM relies on informative priors and such quantification is required as an input to the model. It will ensure that the various assumptions can be tested and their impact on ABPEs assessed.

R4. To test and document the impact of using a coverage benchmark in the DPM (Option 1: correct in the data inputs, Option 2: Correct in the DPM via model parameters). The documentation should contain a description which option has been implemented.

R5. To analyse the sensitivity of the ABPEs to a variety of prior distributions assumed for the accuracy (precision) of each of the data inputs. Special attention should be paid to precision of migration (currently internal, cross-border and international migration being jointly modelled as in- and out-flows to and from LAs). Sensitivity analysis should be carried out for the prior distributions for the coverage adjustment parameters. These analyses will inform if the ABPEs are robust to the assumptions about data quality and help identify extreme situations where the DPM may require further research.

R6. To continue developing a quality assurance processes at each stage of producing ABPEs, i.e. starting with producing data inputs, assessment of their bias and accuracy, quantification in terms of data-corrections and/or model parameters, as well as robustness and sensitivity analyses of the DPM and ABPEs. This is to ensure the sustainability of the DPM if data inputs change or new sources are introduced in the future.

R7. To provide a statement that accompanies the DPM-based ABPEs on the potential sources of uncertainty or bias that are unaccounted for and, where possible, an assessment of their importance in a given situation, e.g. when considering estimates for age groups or LAs.

Further recommendations

R8. To continue research on the data source(s) to be used as a coverage benchmark for the admin-based data used in the DPM (to inform R4).

R9. To develop a process of assessing the quality of all data inputs in terms of bias and accuracy of the data inputs for the purposes of being used in the DPM. Such a process could provide a structure for the data quality assessment (as described in R2) if data collection mechanisms change or new data sources are introduced.

R10. To continue research to better understand the nature of errors (biases and uncertainty) in the data sources that are used as inputs to the DPM. This includes continuing and documenting the simulation studies and other assessments of the data inputs, e.g., the results of simulations carried out for the inclusion rules in the Demographic Index and SPD. This should be done in consultation with a variety of stakeholders and will inform assessments in R2, R3 and R9.

R11. To continue testing of the DPM and resulting ABPEs by using goodness of fit measures via prior and posterior predictive checks, which are a typical component of a Bayesian workflow. These checks include predicting the data inputs by using a model with only prior distributions, or estimated by using data inputs, potentially perturbated by random or systematic removals of portions of the data. Such analyses would complement the sensitivity tests as described in R5.

R12. To develop a comprehensive battery of tests (based on R5 and R11) that can be automatically applied to future versions of the DPM, e.g., if hierarchical structure is to be included in it.

R13. To continue developing an interactive R package containing a toy model that would demonstrate the workflow of the DPM and permit testing of (some of) the model assumptions. This would also benefit the communications of the estimates especially to stakeholders interested in a more detailed understanding of the model. It will also permit a more informed consultation with stakeholders if changes in the model or model assumptions are to be introduced. Feedback from stakeholders may also lead to revisions of the DPM.

R14. To develop an interactive dashboard* (that would accompany the R package or be a standalone piece of software) that would enable comparisons of the ABPEs estimates with the 2021-Census-MYE and also for various assumptions in the DPM. The visualisations could be accompanied by the estimates of errors (bias, e.g., via Mean Percentage Error, and precision, via e.g. Mean Absolute Percentage Error) and a glossary and links to documentation.

*An example of such dashboard that enables a (visual) comparison of model results is maciej-jan-danko.shinyapps.io/HMigD_Shiny_App_I/

R15. To engage with the local authorities and government stakeholders to learn about specific characteristics of the areas and types of errors that may be present in the admin data regarding these areas. This could inform the DPM development by, e.g., helping create the typologies of areas that could then be included in the DPM.

R16. To provide a justification for the current implementation of the DPM that relies on a simplified demographic assumptions because of the computational difficulties. This could be done by providing a comparison of the full model implemented on a small scale with the simplified approach followed by an analysis of errors, similarly to how the assessment of the computational methods and their impacts on the ABPEs has been done.

For consideration

R17. To continue research into methods for quantifying uncertainty in the admin-based migration statistics (international and internal) as these are crucial to producing reliable ABPEs.

R18. To maintain or develop within the ONS the capacity for implementing computational methods used in and around the DPM to reduce risks related to future changes in software packages used in preparing current implementation of the DPM. This will ensure the sustainability of the DPM.

R19. To consider using other countries mirror migration statistics, especially for the British nationals where the International Passenger Survey is being used due to the difficulties of capturing them in the admin data.

R20. To consider a risk assessment of under- vs over-predicting population and its distribution by age, sex and local authority. Such exercise could inform stakeholders about population characteristics of concern and may also guide the research into data quality and representation of uncertainty in the DPM.

Back to top
Download PDF version (354.52 KB)