Model evaluation on a custom dataset
In this exercise, you will use an evaluator
from the Hugging Face evaluate
package to assess the performance of a pretrained model on a custom dataset. Note that, for multi-class classification with dataset imbalances, accuracy is not a reliable performance indicator. Therefore, you will use the ability of the evaluator to provide multiple measures at once: the precision and recall.
A dataset (dataset
) and pipeline (pipe
) have been predefined. The evaluate
library and the evaluator
class have also already been imported.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Instantiate an
evaluator
for your"image-classification"
task. - Extract the integer-to-string label mapping from the pipeline.
- Evaluate the dataset (
dataset
) and pipeline (pipe
) using the metrics stored inmetrics_dict
andlabel_map
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Instantiate the task evaluator
task_evaluator = ____("____")
task_evaluator.METRIC_KWARGS = {"average": "weighted"}
# Get label map from pipeline
label_map = pipe.model.config.____
# Compute the metrics
eval_results = task_evaluator.____(model_or_pipeline=pipe, data=dataset,
metric=evaluate.____(metrics_dict), label_mapping=____)
print(f"Precision: {eval_results['precision']:.2f}, Recall: {eval_results['recall']:.2f}")