Explainability metrics
1. Explainability metrics
The ultimate goal of Explainable AI is to trust and interpret models. But how do we assess the reliability of these explanations? Two crucial metrics help us here: consistency and faithfulness.2. Consistency
Consistency evaluates whether explanations stay stable when a model is trained on different data subsets, such as the training, validation, or test sets, or any other divisions. Low consistency indicates that explanations vary significantly across subsets, suggesting the model’s explanations are not robust. To measure consistency, we split the dataset into parts, two in this case. More splits can provide finer analysis but also increase computation cost, so finding the right balance is key.3. Consistency
We train our model on each subset,4. Consistency
and compare the feature importance calculated for each model. One way to do this is by using cosine similarity, which quantifies the similarity between two vectors. The similarity score ranges from -1 to 1.5. Cosine similarity to measure consistency
A consistency close to 1 suggests highly consistent explanations,6. Cosine similarity to measure consistency
values near 0 indicate little to no consistent pattern,7. Cosine similarity to measure consistency
and values approaching -1 signal completely opposite explanations, highlighting potential issues in the model's training or data preprocessing.8. Admissions dataset
Let's compute consistency on the admissions dataset. Suppose the dataset is divided into two parts: X1, y1 and X2, y2, and we have two random forest regressors, model1 and model2, each trained on one subset.9. Computing consistency
First, we import cosine_similarity from sklearn.metrics.pairwise. We create a tree explainer for each model, and calculate shap values. We compute feature importance by taking the mean of the absolute values across samples. We measure consistency by calculating the cosine similarity between feature importances, passing them between brackets since the function expects 2D arrays. The consistency of 0.99 indicates that the model’s explanations remain stable across different subsets.10. Faithfulness
Faithfulness measures whether the features identified as important for a given sample actually influence the model’s predictions as expected. Low faithfulness indicates that the identified important features do not significantly impact the model’s output, which can undermine trust in the model’s reasoning. This metric is particularly useful when interpretability and feature attribution are crucial, such as in sensitive applications like healthcare or finance. To derive faithfulness, we first generate a prediction for a sample11. Faithfulness
and explain it using SHAP or LIME.12. Faithfulness
Next, we modify the value of one or multiple important features and get a new prediction.13. Faithfulness
One way to estimate faithfulness is by measuring the absolute change in prediction scores when important features are altered. A noticeable change suggests that these features are truly important, as their alteration strongly impacts the prediction, reinforcing the accuracy of the explanation.14. Computing faithfulness
Let’s see this in action. First, we select the first instance to examine and retrieve the model’s original prediction probability for the positive class. Using LIME to explain the prediction, we observe that the GRE score below 309, highlighted in red, is causing a low probability.15. Computing faithfulness
To validate this, we adjust the GRE score to 310 and obtain the new prediction probability. The probability increases from 0.43 to 0.77, changing from a low chance of acceptance to a high one. Calculating faithfulness, we get a score of 0.34, confirming that changing the GRE score significantly impacts the prediction, thereby validating the importance of this feature.16. Let's practice!
Let’s practice applying these metrics!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.