Get startedGet started for free

Model evaluation on a custom dataset

In this exercise, you will use an evaluator from the Hugging Face evaluate package to assess the performance of a pretrained model on a custom dataset. Note that, for multi-class classification with dataset imbalances, accuracy is not a reliable performance indicator. Therefore, you will use the ability of the evaluator to provide multiple measures at once: the precision and recall.

A dataset (dataset) and pipeline (pipe) have been predefined. The evaluate library and the evaluator class have also already been imported.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Instantiate an evaluator for your image classification task.
  • Extract the integer-to-string label mapping from the pipeline.
  • Evaluate the dataset (dataset) and pipeline (pipe) using the metrics stored in metrics_dict and label_map.
  • Print the precision and recall from eval_results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Instantiate the task evaluator
task_evaluator = ____

task_evaluator.METRIC_KWARGS = {"average": "weighted"}

# Get label map from pipeline
label_map = ____

# Compute the metrics
eval_results = ____.____(model_or_pipeline=____, data=____, 
                         metric=evaluate.____(metrics_dict), label_mapping=____)

# Print the precision and recall to 2 decimal places
print(f"Precision: {____:.2f}, Recall: {____:.2f}")
Edit and Run Code