Model evaluation on a custom dataset
In this exercise, you will use an evaluator
from the Hugging Face evaluate
package to assess the performance of a pretrained model on a custom dataset. Note that, for multi-class classification with dataset imbalances, accuracy is not a reliable performance indicator. Therefore, you will use the ability of the evaluator to provide multiple measures at once: the precision and recall.
A dataset (dataset
) and pipeline (pipe
) have been predefined. The evaluate
library and the evaluator
class have also already been imported.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Instantiate an
evaluator
for your image classification task. - Extract the integer-to-string label mapping from the pipeline.
- Evaluate the dataset (
dataset
) and pipeline (pipe
) using the metrics stored inmetrics_dict
andlabel_map
. - Print the precision and recall from
eval_results
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Instantiate the task evaluator
task_evaluator = ____
task_evaluator.METRIC_KWARGS = {"average": "weighted"}
# Get label map from pipeline
label_map = ____
# Compute the metrics
eval_results = ____.____(model_or_pipeline=____, data=____,
metric=evaluate.____(metrics_dict), label_mapping=____)
# Print the precision and recall to 2 decimal places
print(f"Precision: {____:.2f}, Recall: {____:.2f}")