2. Suitable data sources

an icon of a bar chart and a magnifying glass

 

The Code of Practice for Statistics states that statistics should be based on the most appropriate data to meet intended uses. The effect of any data limitations on use should be assessed, minimised and explained. Adherence to these indicators ensures that data sources are suitable.

2.1 Appropriateness and quality of source data

Indicator 2.1 Statistics are based on data sources that are appropriate for the intended uses. Producers evaluate appropriate quality dimensions in relation to data sources to ensure that statistics are suitable for the intended uses.

A range of data sources are used in the production of UK economic statistics. These can include surveys, administrative data, data gathered from websites or third parties, modelled estimates and alternative data sources, such as web-scraped data or scanner data. In some cases, there may be more than one data source, each with its own strengths and limitations. Producers should evaluate these data sources against the most relevant quality dimensions.

The most relevant quality dimensions will vary according to the source. Examples of quality dimensions that could be used to assess the quality of input data include the European Statistical System dimensions of quality and the UK Government Data Quality Framework. The ESS dimensions are designed for reporting output quality, but as the inputs to a set of statistics are often the output from another, these dimensions may prove useful. For administrative sources, the producer team should have evaluated the completeness of the dataset both in terms of the population of interest and the level of missing values and any differences between the concept being collected and the concept of interest, such as national accounting versus commercial accounting concepts. For survey sources, we would expect producer teams to have evaluated the suitability of the sample sizes, questionnaire design, response rates and sampling errors associated with the survey as well as other dimensions, such as timeliness. For modelled estimates, the producer team should have evaluated the assumptions of the model, the sensitivity of estimates to these assumptions and the scale of modelling errors. For other data sources, such as web-scraped or scanner data, producers should identify the most appropriate quality dimensions and ensure that they understand the fitness for purpose of the data source. Our Quality Assurance of Administrative Data (QAAD) toolkit provides guidance to producers about the practices they can adopt to assure the quality of data they receive whatever the source.

This indicator is derived from the first part of Code practice Q1.1. Both the IMF DQAF and the ESS QAF also include indicators relating to the quality of data sources. For example, the ESS QAF includes indicator 6.2, ‘Choices of data sources and statistical methods as well as decisions about the dissemination of statistics are based on statistical considerations’.

Example questions:

  • What data sources are used to produce the statistics?
  • Considering each data source used in the statistics being assessed in turn, and including sources used for adjustments:
    • For survey sources: What are the sample sizes, response rates and sampling errors of the data? Has good practice been followed in the design of the data collection? How do these factors affect the suitability of the data sources for the purpose? How does the producer mitigate any limitations?
    • For administrative sources: What are the coverage and conceptual limitations of the data? Are there missing values for some observations of interest? How do these factors affect the suitability of the data sources? How does the producer mitigate any limitations?
    • For modelled estimates: What are the assumptions and modelling errors associated with the model? What is the quality of the input data to the model? How have the results been quality assured? How do these factors affect the suitability of the data sources? How does the producer mitigate any limitations?
    • For alternative data sources: How have the data been collected? What are the coverage and conceptual limitations of the data in relation to the intended use? What are the sources of potential bias? How do these factors affect the suitability of the data sources? How does the producer mitigate any limitations?
  • For each data source, has the producer completed a QAAD or similar process to assure themselves that the data are appropriate for the intended uses?
  • What information is available to the producer about the quality of the data and is it sufficient to judge quality?
  • What quality dimensions has the producer considered and what are the findings for each? Have any issues been identified?
  • Have any quality dimensions not been considered? (Think about both the ESS dimensions of quality and the quality dimensions in the QAAD).
  • What feedback have users given about the choice of data sources?
  • Are the statistics based on appropriate data sources for the intended uses?

2.2 Definitions and concepts of data sources

Indicator 2.2 Data sources are based on definitions and concepts that are suitable approximations of what the statistics aim to measure, or that can be processed to become suitable for producing the statistics.

The definition and concepts used in the source data may not always be the definition or concept that the statistics aim to measure. This can be for a variety of reasons, including the ability of respondents to provide the information in a survey or the purpose of an administrative data source. In using these data sources, the producer needs to consider whether the concepts and definitions are suitable approximations of what the statistics aim to measure and, if not, whether the data can be processed to become suitable.

This indicator is derived from the second part of the Code practice Q1.1. Both the IMF DQAF and the ESS QAF include indicators for the definitions and concepts in source data approximating the required definitions and concepts. For example, IMF DQAF includes indicator 3.1.2, ‘Source data reasonably approximate the definitions, scope, classifications, valuation, and time of recording required.’

Example questions:

  • What are the concepts and definitions of the data sources?
  • Are they suitable approximations of what the statistics aim to measure?
  • What processing, if any, is required to make them suitable?

2.3 Coherence of source data

Indicator 2.3 Source data are coherent across different levels of aggregation, consistent over time, and comparable between geographical areas, whenever possible. Internal coherence of source data is regularly monitored.

Coherence, consistency and comparability are important dimensions of quality, ensuring that comparisons within and across datasets are robust. If data are not coherent across levels of aggregation, then totals will not be able to be compared with more-granular estimates. If data are not consistent over time, then analysis of trends over time will not be possible, and if data are not comparable between geographical areas, then any comparison will not be robust. The coherence within the source data should be monitored to ensure that it does not change over time.

This indicator is derived from the Code practice Q1.4 with wording from the ESS QAF process 14.1.1 added to reflect the monitoring of internal coherence within the source data. The IMF DQAF also includes similar indicators, such as 4.2.1: ‘Statistics are consistent within the dataset.’

Example questions:

  • Are the source data coherent across levels of aggregation, time and geographical area?
  • When was the coherence of the data last monitored? Is coherence considered as a part of regular quality assurance?

2.4 Explanation of data sources

Indicator 2.4 The nature of data sources used, how and why they were selected, and any adjustments applied to them are explained to users.

Where users understand the data sources used to produce statistics, they are better able to understand the quality and suitability of the statistics for their uses. Explaining the data sources used, how and why they were selected, and any adjustments applied to them will aid discussions with users about the quality of the resulting statistics and ensure that they are fit for purpose. Transparency of the data sources will also help the producer of the statistics when there are changes in personnel.

This indicator is derived from the first part of the Code practice Q1.5. Reference to adjustments has been added to reflect that in economic statistics, adjustments are often applied when estimating National Accounting concepts. The ESS QAF includes a similar indicator, 6.4: ‘Information on data sources, methods and procedures used is publicly available.’

Example questions:

  • Where are the data sources explained to users?
  • Do these explanations include information on how and why the data sources were selected and any adjustments applied to them?

2.5 Explanation of the quality of source data

Indicator 2.5 Quality of the source data, including potential bias, uncertainty and possible distortive effects, is explained to users and the extent of any impact on the statistics clearly reported.

In addition to explaining the data sources used to users of the statistics, it is important that the producer also explains the quality of the source data used in the statistics. Things to acknowledge include potential bias, uncertainty or possible distortive effects in the source data. Clearly explaining potential quality issues to users will aid informed discussions on the quality of the resulting statistics for each use and will also help when there are changes in personnel in the producer team.

This indicator is derived from the second part of the Code practice Q1.5.

Example questions:

  • Where is the quality of the data sources explained to users?
  • Do these explanations include information on potential bias, uncertainty and distortive effects and the impact on the statistics?

2.6 Limitations of data sources

Indicator 2.6 The limitations of data sources are identified and addressed where possible. Statistics producers are open about the extent to which limitations can be overcome and the effect on the statistics.

Understanding the limitations of the data sources used to produce statistics is important for understanding their quality. It is rare that a data source perfectly matches the required concepts, coverage and completeness. Therefore, identifying the limitations of the available data sources, understanding the underlying causes and seeking ways to address them, where possible, will help improve the quality of the statistics. Producers should be open about the extent to which limitations can be overcome and the effect on the statistics so that the quality and fitness for purpose of the statistics are understood.

This indicator is derived from the Code practice Q1.6.

Example questions:

  • What causes of limitations in the data sources have been identified?
  • How have these limitations been mitigated?
  • How have producers been open about the extent to which limitations can be overcome and the effect on statistics?

2.7 Relationships with data suppliers

Indicator 2.7 Producers establish and maintain constructive relationships with those involved in the collection, recording, supplying, linking and quality assurance of data.

The relationship between a producer team and those involved in the collection, recording, supplying, linking and quality assurance of data is key to ensuring the quality of source data. These relationships enable communication that aids the producer’s understanding of the quality of the source data and the supplier’s understanding of the quality dimensions that are important for the intended uses. Concerns around the data are more effectively communicated and resolved where these relationships are strong.

This indicator is derived from the Code practice Q1.2. The ESS QAF includes a related indicator, 8.7, which states ‘Statistical authorities co-operate with holders of administrative and other data in assuring data quality.’

Example questions:

  • What are the relationships between the producer and those collecting, recording, supplying, linking and quality-assuring the data?
  • How do these relationships help ensure the data are suitable and of the required quality?
  • How are those relationships maintained?
  • Have the suppliers raised any concerns around this relationship?

2.8 Statement of data requirements

Indicator 2.8 Producers share a clear statement of data requirements with the organisations that provide that data, setting out decisions on timing, definitions and format of data supply, and explaining how and why the data will be used.

Providing clear statements of the requirements of the data and explanations of how and why the data will be used can help suppliers understand the required quality of the data and the types of concerns that will have the most effect on the use of the data. These statements can be included in Memoranda of Understanding, Service Level Agreements or similar arrangements to help to ensure that appropriate data are supplied, and received, at the required timescales and in the required format. How the receiver of the data can raise queries around the data and  any quality concerns could also be included. Agreeing these aspects in advance will improve the quality of the data and enable resources to be used for other improvements to quality rather than chasing or re-formatting data.

This indicator is derived from the Code practice Q1.3. The ESS QAF includes a similar indicator, 8.6, which states ‘Agreements are made with holders of administrative and other data which set out their shared commitment to the use of these data for statistical purposes.’

Example questions:

  • Does a statement of data requirements exist (for example, a Service Level Agreement or Memorandum of Understanding) for each data source?
  • If so, do they set out decisions on timing, definitions and format and explain how and why the data are used?
  • Has a feedback mechanism been identified for raising any queries or concerns about quality of the data?

2.9 Source metadata

Indicator 2.9 Producers specify and receive appropriate metadata with each data delivery to ensure the quality of the data is understood.

Whilst statements of data requirements set out the required aspects of data quality that apply to all data deliveries, metadata can also provide quality information about an individual instance of data delivery and help the producer understand the quality of that data delivery. Depending on the type of data source, the metadata may include response rates, levels of missing data, information on real-world context that affect the data (such as adverse weather) or any quality issues that the supplier has identified. Metadata may also include information on strengths and limitations of the data for their intended use. These metadata facilitate conversations about quality and enable the statistics producer team to understand and explain the quality of the resulting statistics to its users.

The Code refers to the provision of metadata to users of statistics but does not explicitly refer to metadata being provided by suppliers of data. This indicator has been included in this framework as the lack of provision of metadata by suppliers is a determinant of the quality of the statistics that are based on that data source. The ESS QAF includes a related process, 8.6.4: ‘Documentation of administrative and other data. The data holder systematically provides the statistical authorities with documentation/metadata about the content of the administrative and other data as well as the production process of the data (e.g. a methodological document, concepts and definitions, and populations)’.

Example questions:

  • Do metadata accompany each delivery?
  • What metadata are received and how does the producer use the metadata?
  • How do the metadata help the producer understand the quality of the data and communicate it clearly to users?

2.10 Regular review of source data

Indicator 2.10 Producers regularly review data sources to ensure that they continue to be suitable.

In addition to evaluating the quality and suitability of data sources when developing new statistics, producers should regularly review the data sources to ensure that they continue to be suitable. Over time, there can be changes in the quality of a data source, such as reducing response rates or changes to the collection of administrative sources. In addition, new data sources which improve quality may become available.

This indicator relates to Code practice Q3.5 around systemic and periodic reviews on the strengths and limitations of data and methods. Both the IMF DQAF and ESS QAF include indicators around regular reviews of data sources, including the sample selections, questionnaires and comprehensiveness of the sources. For example, the IMF DQAF includes indicator 3.2.1, ‘Source data-including censuses, sample surveys and administrative records-are routinely assessed, for example for coverage, sampling error, response error, and non-sampling error; the results of the assessments are monitored and made available to guide statistical processes.’

Example questions:

  • When did the producer last review its data sources?
  • Were any new or emerging data sources identified which may be more suitable to estimate the concept of interest?
  • What were the key findings of those reviews?
  • Are there any data sources which have not been recently reviewed? If so, why not?

2.11 Innovation in sourcing data

Indicator 2.11 Producers are innovative with their approach to sourcing data and consider alternative data sources to facilitate better-quality or timelier statistics, where appropriate

As technology has improved, the range of data sources available to producers has increased. Producers should be innovative in evaluating the most suitable data source for the concept that they are estimating and the quality dimensions which are important for their users. A non-traditional data source may provide statistics that are timelier, have a higher periodicity or which have a larger sample and so may have improved accuracy or allow for more-granular statistics. At the same time, producers need to ensure that they have considered any negative effects on quality, such as a decrease in relevance or coherence and comparability. Producers will need to take into account the risks to future stability and supply of the data and the impact of the use of the data on their methods. Looking at international practice for measuring the same concept may also help producers be innovative around sources of data by highlighting the potential of new data sources.

This indicator is aligned with Code principle V4, which encourages innovation and improvement. This principle states that statistics producers should be creative and motivated to improve statistics and data, recognising the potential to harness technological advances for the development of all parts of the production and dissemination process. We have included this indicator in our framework to reflect the drive towards innovation where it can provide improvements in quality and value. The ESS QAF includes indicator 10.3, ‘Proactive efforts are made to improve the statistical potential of administrative and other data sources and to limit recourse to direct surveys’.

Example questions:

  • What innovative ways of sourcing the data, including alternative data sources, has the producer considered?
  • What were the benefits and limitations of using these data? Has there been a transparent evaluation of the effect across all quality dimensions?
  • Were any new ways of sourcing data implemented? If so, what has been implemented, how and why?
  • What are the barriers to investigating and implementing alternative ways of sourcing data?

2.12 Explanation of changes to data sources to users

Indicator 2.12: The effect of changes in the circumstances and context of a data source on the statistics over time should be evaluated. Reasons for any lack of consistency and related implications for use should be clearly explained to users.

Over time, there may be changes in the circumstances or context of a data source. This may be due to changes in the policy environment or changes to a survey such as the sample selection or response rates. The effect of these changes should be evaluated so that their implications for the statistics are understood. If these changes result in a lack of consistency over time or other related implications, then these should be explained to users so that they can assess the continued fitness for purpose of the statistics. Where possible, a consistent time series should be published.

This indicator is derived from the Code practice Q1.7. The ESS QAF also includes indicator 14.2, ‘Statistics are comparable over a reasonable period of time’.

Example questions:

  • Have there been any changes in the circumstances and context of the data sources? If so, what implications are there for the statistics?
  • Where have reasons for a lack of consistency and related implications been explained to users?
  • Are the explanations clear?
  • Has a consistent time series been published, where possible?

2.13 Monitor and minimise burden

Indicator 2.13: Statistics producers are transparent in their approach to monitoring and reducing the burden on those providing their information, and on those involved in collecting, recording and supplying data. The burden imposed should be proportionate to the benefits arising from the use of the statistics.

As set out in the Government Analysis Function guidance on Monitoring and reducing respondent burden, response burden can affect response quality through non-response or attrition to surveys. In addition, where the burden of providing data is high, respondents might get survey fatigue, which may lower the quality of their responses. The fewer data people are asked to provide, and the quicker and easier data collections are to complete, the higher the quality of the data is likely to be. Statistics producers should therefore take measures to monitor and reduce response burden, through balancing it with user need, to help maximise the quality of their data. In a similar way, reducing burden on those involved in collecting, recording and supplying data, whether from survey, administrative or alternative data sources, will help to ensure the quality of the data.

This indicator is derived from Code practice V5.5. It appears in the efficiency and proportionality principle of the Value pillar and, as described above, is key to ensuring quality. The ESS QAF has several indicators around non-excessive burden on respondents under Principle 9, ‘The response burden is proportionate to the needs of the users and is not excessive for respondents. The statistical authorities monitor the response burden and set targets for its reduction over time’.

Example questions:

  • What is the producer’s approach to monitoring and reducing burden on those providing their information?
  • How transparent is the approach? For example, is there public information on it?
  • What is the producer’s approach to monitoring and reducing burden on those collecting, recording and supplying data?

2.14 Collaborate to maximise use of data

Indicator 2.14: Statistics producers communicate and collaborate with others to maximise their use of administrative data, data sharing, cross analysis of sources and the re-use of data to avoid duplicating requests for information.

Re-use of data can help ensure quality through reducing burden on those collecting, recording and supplying data, providing additional evidence for validation and enabling cross analysis of sources. It also increases the use of the data for different purposes, which can increase the amount of validation of the data. Communicating and collaborating with others, whether they are holders of additional data or potential users, helps maximise the use of the data to deliver these quality benefits.

This indicator is derived from Code practice V5.1, with emphasis on communication and collaboration added. The indicator is supplemented with wording from the ESS QAF around avoiding duplicating requests for information. The ESS QAF also has a similar indicator, 9.5: ‘Data sharing and data integration, while adhering to confidentiality and data protection requirements, are promoted to minimise response burden.’

Example questions:

  • How has the producer communicated and collaborated with others to maximise the use of data?
  • How else has the producer maximised its use of administrative data, data sharing, cross analysis of sources and the re-use of data?
  • What are the barriers to collaborating with others to maximise the use of administrative data, data sharing, cross analysis of sources and the re-use of data?

Back to top
Download PDF version (446.08 KB)