LoslegenKostenlos loslegen

Object detection

In this exercise, you will use the same flickr dataset as previously, which has 30,000 images and associated captions. Now you will find bounding boxes of objects detected by the model.

Photo of 2 people, 1 is playing the guitar

The sample image (image) and pipeline module (pipeline) have been loaded.

Diese Übung ist Teil des Kurses

Multi-Modal Models with Hugging Face

Kurs anzeigen

Anleitung zur Übung

  • Load the object-detection pipeline with facebook/detr-resnet-50 pretrained model.
  • Find the label of the detected object.
  • Find the associated confidence score of the detected object.
  • Find the bounding box coordinates of the detected object.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Load the object-detection pipeline
pipe = pipeline("____", "____", revision="no_timm")
pred = pipe(image)
outputs = pipe(image)

for n, obj in enumerate(outputs):
    # Find the detected label
    label = ____
    # Find the confidence score of the prediction
    confidence = ____
    # Obtain the bounding box coordinates
    box = ____
    
    plot_args = {"linewidth": 1, "edgecolor": colors[n], "facecolor": 'none'}
    rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax']-box['xmin'], box['ymax']-box['ymin'], **plot_args)
    ax.add_patch(rect)
    print(f"Detected {label} with confidence {confidence:.2f} at ({box['xmin']}, {box['ymin']}) to ({box['xmax']}, {box['ymax']})")

plt.show()
Code bearbeiten und ausführen