Get startedGet started for free

Model evaluation on a custom dataset

In this exercise, you will use an evaluator from the Hugging Face evaluate package to assess the performance of a pretrained model on a custom dataset. Note that, for multi-class classification with dataset imbalances, accuracy is not a reliable performance indicator. Therefore, you will use the ability of the evaluator to provide multiple measures at once: the precision and recall.

A dataset (dataset) and pipeline (pipe) have been predefined. The evaluate library and the evaluator class have also already been imported.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Instantiate an evaluator for your "image-classification" task.
  • Extract the integer-to-string label mapping from the pipeline.
  • Evaluate the dataset (dataset) and pipeline (pipe) using the metrics stored in metrics_dict and label_map.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Instantiate the task evaluator
task_evaluator = ____("____")

task_evaluator.METRIC_KWARGS = {"average": "weighted"}

# Get label map from pipeline
label_map = pipe.model.config.____

# Compute the metrics
eval_results = task_evaluator.____(model_or_pipeline=pipe, data=dataset, 
                         metric=evaluate.____(metrics_dict), label_mapping=____)

print(f"Precision: {eval_results['precision']:.2f}, Recall: {eval_results['recall']:.2f}")
Edit and Run Code