Auditing Machine Learning Systems: A Practical Approach

Table of Contents

The Need for Auditing ML Systems
The Challenge of Auditing
The Proposed Audit Procedure
Case Studies: Conducting the Pilots
Lessons Learned from the Pilots
Conclusion
Original Source
Reference Links

The use of Machine Learning (ML) systems is growing rapidly, but this brings up many ethical issues and public concerns. There is a clear need to audit these systems to ensure they follow ethical standards. To make auditing a standard practice, two important things need to be in place: a lifecycle model that emphasizes Transparency and Accountability, and a risk assessment process to guide the audit effectively.

This article explains a practical approach towards the auditing of ML systems, which builds on guidelines developed by the European Commission. Our auditing method is based on a lifecycle model that focuses on documentation, accountability, and quality checks, creating a common understanding between the auditors and the organizations being audited.

We describe two pilot studies involving real ML projects, discuss the challenges faced in ML auditing, and suggest future improvements.

The Need for Auditing ML Systems

With the rise of ML technologies, questions about their ethical use and the potential for bias have become very important. Many organizations have created their own guidelines regarding ethical AI, but these documents often do not change how developers actually make decisions. This is mainly because these guidelines are often too vague, and practical tools to support them are usually missing. As a result, developers may feel less responsible for their choices.

One solution to this issue is to apply auditing processes when designing and operating ML systems. Auditing can help ensure accountability and make ethics guidelines more effective. Audits can either be done internally by people in the same organization or by an external party, which is often seen as more trustworthy, especially for high-risk applications.

While external audits can be costly, they often provide more trust to stakeholders. On the other hand, internal audits can promote better documentation and Risk Assessments, increasing the overall traceability of the systems being audited.

The Challenge of Auditing

Conducting audits effectively requires clear conditions to be in place. Both auditors and the organizations must understand the expectations and practices during the audit process. This includes defining standards, evidence collection, testing, and the roles of different individuals involved in the audit.

However, there is a lack of standard practices for assessing risks in ML systems. We propose a new auditing approach inspired by existing Information Systems auditing practices, tailored to address these challenges.

The Proposed Audit Procedure

Our auditing procedure consists of three main phases: planning, fieldwork/documentation, and reporting. While these phases may seem sequential, it is important to note that auditing ML systems should be an ongoing process, reflecting the rapid changes and iterations that often occur in ML development.

Planning Phase

The planning phase aims to set the scope of the audit and create a roadmap for the following phases. This involves reviewing previous audit reports and conducting a risk assessment. It is crucial to determine the resources and skills needed for the audit, often requiring a team with diverse backgrounds to cover the complexities of ML systems.

A good understanding of the overall system architecture and the processes involved in creating and deploying ML models is key during this phase. To achieve this, we follow a lifecycle model that emphasizes Ethical Principles and identifies key risks.

Lifecycle Model

A lifecycle model serves as a common reference for both auditors and the organizations being audited. Most existing models focus heavily on technical details and do not sufficiently incorporate ethical principles such as transparency and accountability.

We propose an enhanced lifecycle model that includes four main steps: formalization, data management, model management, and deployment. In addition to these steps, we introduce three important aspects that promote accountability and transparency:

Agility of Each Phase: Instead of viewing the lifecycle as a linear process, our model treats each phase as iterative and emphasizes the need for documentation and quality checks throughout.
Transparency and Accountability: Our model aligns the different phases with the roles and responsibilities of individuals involved, making it clear what documentation should be produced.
Continuous Impact Assessment: The model highlights the importance of ongoing assessments of how the ML system affects its users and the context in which it operates.

Mapping the audited system onto this lifecycle model helps identify relevant phases and documents to collect, as well as tailor the risk assessment accordingly.

Risk Assessment

Effective auditing relies on having a documented knowledge base of potential risks. For ML systems, this knowledge can be more difficult to gather. Our proposed risk assessment method uses the lifecycle model to simplify the analysis of risks by breaking it down into manageable components.

By using existing frameworks like the European Commission's Assessment List for Trustworthy Artificial Intelligence (ALTAI), we can develop relevant questions to ask during each phase of the lifecycle. This helps ensure that subsequent fieldwork and documentation are guided by these questions.

Fieldwork and Documentation Phase

In this phase, the auditor collects evidence to verify compliance with regulations and assess the effectiveness of control measures through various tests. Evidence can be collected using two main mechanisms:

Transparency Mechanisms: This involves reviewing information disclosed by the developers, such as datasheets and model cards.
Examinability Mechanisms: Here, the auditor conducts experiments directly on the system to validate the information provided earlier.

Once evidence is gathered, the auditor can perform compliance testing, which checks for any discrepancies between the organization’s specifications and the actual implementation. This can help identify any weaknesses in quality assessments and documentation.

Custom Testing

In cases where standard tests are not sufficient, auditors can create their own tests to evaluate aspects of the ML system that may not have been thoroughly assessed by the development team. While this approach can lead to some inconsistencies in the audit, it can be crucial for ensuring that all relevant factors are considered.

Reporting Phase

After the audit is complete, the auditor compiles the results of the various tests and defines criteria for future audits. This could involve scheduling regular audits or conducting them when significant changes occur in the system or user feedback indicates a problem.

The auditor should ensure that any recommended mitigation measures are implemented before the next audit iteration.

Case Studies: Conducting the Pilots

Our auditing procedure has been tested through two pilot studies involving real ML applications. These pilots are not designed to serve as universal templates but rather to encourage dialogue about best practices in ML auditing.

Pilot 1: AI-Assisted Calibration System

This pilot involved an ML system that automates calibration processes for safety components, which have traditionally been conducted manually by engineers. The goal is to support engineers without replacing their expertise.

During the audit, we focused on the formalization, model management, and operationalization processes, as previous data management was already reviewed. The lifecycle model helped structure discussions and identify relevant documentation.

Risk Assessment

The risk assessment process involved filtering questions relevant to the specific steps of the lifecycle model. Key ethical concerns arising from this audit included transparency, explainability, robustness, and safety.

Transparency and Explainability: The system’s output is clear, but it does not adequately convey the uncertainty associated with its recommendations. Suggestions for improvement included logging user selections for validation and conducting experiments to assess over-reliance on the model’s output.
Robustness and Safety: There is a need for thorough documentation of design decisions, failures, and post-market surveillance to ensure safety. Additionally, recommended performance thresholds should be established.

Pilot 2: Geriatronics Project - Vision System

The second pilot focused on the vision module of GARMI, a robotic platform designed to assist elderly people. This project was still in the research stage, and while ethical concerns had been previously addressed, there was insufficient documentation for full auditing.

Risk Assessment

We identified several ethical requirements from the ALTAI that needed to be documented. The audit identified the need for better traceability of data and model management processes, as well as ensuring compliance with privacy regulations.

Documentation templates could help streamline the process, allowing the team to focus on integrating ethical considerations into system development moving forward.

Lessons Learned from the Pilots

The pilots highlighted several key takeaways for future auditing of ML systems:

Auditability Criteria: Not all systems are suitable for auditing due to restrictions on data access or lack of necessary documentation. Establishing auditability criteria prior to planning is crucial.
No One-Size-Fits-All Solution: Different ML systems may require varied auditing approaches based on their risk levels and specific contexts.
Continuous Auditing: Early engagement between auditors and development teams can lead to better documentation and reduced compliance costs. Ongoing collaboration throughout the ML system’s lifecycle may help catch issues earlier and simplify future audits.
Database of Documented Risks: Maintaining a database of past incidents and associated risks can be beneficial for future audits. This kind of tool would aid in developing effective preventive measures.

Conclusion

This article presents a practical approach to auditing machine learning systems, focusing on creating a common understanding between practitioners and auditors. Our proposed process includes a lifecycle model and risk assessment method that integrates ethical principles to enhance accountability and transparency.

We have shown our methodology through real-world examples and discussed the challenges that remain in this field. As the landscape of ML evolves, developing standardized practices and adapting our auditing methods will be essential for ensuring ethical and trustworthy AI systems.

Auditing Machine Learning Systems: A Practical Approach

A guide to ethical auditing methods for machine learning technologies.

The Need for Auditing ML Systems

The Challenge of Auditing

The Proposed Audit Procedure

Planning Phase

Lifecycle Model

Risk Assessment

Fieldwork and Documentation Phase

Custom Testing

Reporting Phase

Case Studies: Conducting the Pilots

Pilot 1: AI-Assisted Calibration System

Risk Assessment

Pilot 2: Geriatronics Project - Vision System

Risk Assessment

Lessons Learned from the Pilots

Conclusion

Reference Links

Referenced Topics

Auditing Machine Learning Systems: A Practical Approach

A guide to ethical auditing methods for machine learning technologies.

#The Need for Auditing ML Systems

#The Challenge of Auditing

#The Proposed Audit Procedure

#Planning Phase

#Lifecycle Model

#Risk Assessment

#Fieldwork and Documentation Phase

#Custom Testing

#Reporting Phase

#Case Studies: Conducting the Pilots

#Pilot 1: AI-Assisted Calibration System

#Risk Assessment

#Pilot 2: Geriatronics Project - Vision System

#Risk Assessment

#Lessons Learned from the Pilots

#Conclusion

Reference Links

Referenced Topics

The Need for Auditing ML Systems

The Challenge of Auditing

The Proposed Audit Procedure

Planning Phase

Lifecycle Model

Risk Assessment

Fieldwork and Documentation Phase

Custom Testing

Reporting Phase

Case Studies: Conducting the Pilots

Pilot 1: AI-Assisted Calibration System

Risk Assessment

Pilot 2: Geriatronics Project - Vision System

Risk Assessment

Lessons Learned from the Pilots

Conclusion