Get startedGet started for free

Data audit

1. Data audit

After data has been selected and integrated into a model, the next step is to audit it.

2. What we will cover

We're going to discuss data auditing within a project and data validation, and finally, we'll wrap up with a more in-depth look at bias mitigation strategies.

3. Data audit

Data auditing is a complete data review throughout a project's lifecycle to check its technical and responsible state. This includes checking for data quality, bias, and representativeness. Audits are typically conducted after a major change to the project, and documentation is updated to keep an accurate record of any transformations and when the audit took place. The definition of a major change can vary, but some examples include adding new data sources, new preprocessing steps, or updates to the model.

4. Performing data audits

Ideally, data audits are done frequently to ensure issues are not unintentionally introduced. They safeguard us from complications and accidentally amplifying errors, helping us remain transparent and accountable and building trust with stakeholders.

5. AI Financial advisor

Consider an AI financial advisor for personalized financial planning. During interactive chat sessions, the app builds a comprehensive behavior profile, plans goals, and develops a strategy. The app analyzes user-uploaded documents to achieve this. It then matches goals with investment products and creates a tracked and optimized investment portfolio. The project uses data provided by the user and external real-time and historical financial data from Bloomberg.

6. Project data sources

The project uses two data sources. The user data is qualitative, such as text from the chat sessions, and quantitative, such as data from the uploaded documents. The external data comes from an API and is also quantitative.

7. Data audit setup

We use the Data Management Plan to perform any data audits. In this case, the plan details cultural and contextual differences in data, the frequency of audits, required tests, and the people assigned to conduct them.

8. Data audits schedule

We conduct the initial data exploration and regular audits during preprocessing and modeling. We audit data after we complete any substantial changes and focus on data quality, bias, or representativeness. Pre-deployment, we conduct an extensive audit to ensure that the final model uses high-quality data and complies with all responsible expectations. We also check that the data generated by the model complies with our fairness and responsible policies.

9. Ongoing data audits

Post-deployment, we conduct regular monitoring. This continuous monitoring is essential, and its frequency can range from real-time monitoring to weekly or quarterly reviews, depending on how sensitive and dynamic the data and the environment are. We focus on data quality and compliance, model performance and fairness metrics, data usage and storage, security, and scalability. We analyze feedback from users to identify any hidden issues. We pay special attention to possible new patterns and biases in data and watch for model drifts.

10. Model drift

The deployed model may become less accurate with time due to societal changes, market conditions, or underlying variables. This is called a model drift. To detect model drift, we log predictions and assess performance metrics. If these metrics breach pre-set thresholds, we trigger alerts indicating a review may be needed. To fix the model drift, we need to check new data for new biases and retrain the model.

11. AI financial advisor data audits

In the Financial Advisor project, we start with audits during the initial data exploration. Then, we audit as we transform and clean data to check its technical quality and responsible dimensions. At the modeling stage, we audit to assess fairness and bias, and at the pre-deployment stage, we conduct a full audit for a smooth launch. After that, we use continuous monitoring for compliance, impact review, and model drift.

12. Let's practice!

Now, let's practice!