Part II: Developing and using a model that serves the public good
If you have successfully planned and designed your model, or proposed changes to a current model, you now should consider how to best to develop and use that new model to serve the public good. When creating statistics or data to inform decisions, there are no differences in the way the Code of Practice should be applied for traditional statistical models, support-machine vector models, deep learning models or any other technique. In every case, the purpose of the Code of Practice is to ensure everyone works in a way which demonstrates the three Code pillars: trustworthiness, quality, and value. The following section highlights the relevant principles of the Code of Practice through the lens of models.
Relevance to users
Transparency is a core principle in the Code of Practice; see transparent processes and management. Users should be at the heart of any decision to change the way statistics are produced or decisions made. The development of new models, or updates to current models, can improve the quality of statistics or decisions, and address user need, such as filling known data gaps. You should be clear on the use case for your model and ensure you have engaged with a range of users prior to introducing changes. You should also communicate any benefits and potential risks, or trade-offs, associated with your model. You must engage widely and be open and transparent about how you have engaged with users, and the lessons learned from your engagement. This engagement should continue once the model is introduced to ensure user need is factored into the continuous development and monitoring of the model. It is important to be realistic in model expectations when speaking to the user, especially about the scope and ability of the model.
- A range of users have been engaged with and the model output meets the requirements of their use case.
- How parameters in the model were chosen have been explained to users so that they understand their relevance.
- Any developments which have been paused because of introducing changes have been communicated to users.
Equality of access means that official statistics generated by a model should be available to all users, and not given to some before others. It is good practice to make data used by and generated from models open-access where applicable and appropriate. In some cases, this may not be possible. Model guidance and documentation should be accessible to all, and work with the most used assistive technologies. Any metadata should be made easily accessible to users. If model documentation is held separate from project documentation, both should reference one another. Model code should be findable, accessible, inter-operable, and re-usable. Open model code should be stored on a site which adopts a high-level of accessibility, such as GitHub. You should provide contact information for users who need to raise concerns or ask questions regarding accessibility.
- The model code, data (where possible), and documentation have been made open and accessible to all.
Clarity and insight
You should collaborate with experts in both the type of model being used or developed and the subject matter which the data concern. This is to ensure any new insights drawn from the model are aligned with the experts’ understanding. It shall also help you identify potential errors or bias in your model. Explanatory material should be clear about the involvement of any expert oversight groups or relevant stakeholders in the development of the model. Material should also set out the plans for the regular review of the model to ensure it remains fit for purpose.
You must support users’ understanding of the appropriate use of the statistics where they are changed by a change in the model. As your model may be used by others for purposes not designed by you, you should state why the model was designed and for what purpose. Full technical explanation should always be produced for those who wish to understand the technical detail. However, technical documents may not be appropriate for all interested parties and therefore should not be the only method of communication. Consider making use of analogies and visual aids to communicate the model process while still providing access to the essential information. Please consult the Aqua Book chapter ‘The importance and implications of uncertainty’ for more information.
- Ongoing engagement with users has been planned to ensure the model remains fit for purpose.
- The documentation is sufficient to allow users to understand the model and statistics or data produced.
Explanation and interpretation
You must know how your model works and know who you will need to communicate this to. This explanation may be different for different audiences. Explanatory material should be relevant and presented in a clear, unambiguous way that is easily accessible to users. There is a risk that poor communication leads to misuse or misinterpretation of the model. This in turn could lead to over or under reliance on its outputs. Public acceptability is related to how explainable your model is. Ask yourself the following questions. Will there be an expectation that users need or want to know how the decision was reached or statistic produced? Will it impact decisions made about them? Or does the outcome of the model provide such a public good that there is public acceptance that the models behaviour cannot be explained? For models which impact on decisions made about individuals, the model should be appropriate for this context and purpose.
- It is known how the model is reaching its outcome or decision, and the result is reproducible.
Model explainability may be challenging, especially for complex models, but it is necessary to show an attempt to explain your model. Whereas explainability of traditional statistical models such as a linear regression may be easier, machine learning model explainability may be more difficult. This is due to the ability of some machine learning techniques identifying relationships and patterns in the data that are not easily detectable, or understandable, by humans at scale. This means it can be difficult to describe how a machine learning model has reached a decision. You must determine how well you are able to, and need to, explain your model. You should be able to communicate assumptions built into the model, known biases, and how the model is forming its decisions, and the uncertainty inherent in its outputs. For some models, the sophistication of the models learned behaviour means that it may be impossible for you to understand how an outcome has been reached. These models are also known as ‘black boxes’ or opaque models. Black box models do not allow for transparency of model decisions. Without appropriate steps being taken to make sure they are interpretable black box models could damage the trustworthiness of the statistics produced or decisions made, and you should be sure that a simpler, more explainable model could not have been used instead.
- The users’ needs for model explainability are known.
- I understand that failure to explain the model may lead to misuse or misinterpretation of the model.
- If the model cannot be fully explained, it has been explained as much as possible and the reasons why it cannot be entirely explained have been communicated.
Interpretability is different to explainability in that it focuses more on the assurances that can be made that the model is of the appropriate quality. This is opposed to explainability which is being able to explain the workings of the model itself. If a model is deemed to be difficult to explain, then it should be interpretable to meet the expectations of the Code of Practice. This means having stringent quality assurance processes that can satisfy developers and those who are accountable for the model. Quality assurances show that the model is fit for purpose and generating outputs that can be trusted. Some examples of these processes can be found in the Assured Quality section.
- The model is interpretable by all.
- The users can understand the model sufficiently so that they know what changes to inputs might make on outputs.