Object detection
In this exercise, you will use the same flickr dataset as previously, which has 30,000 images and associated captions. Now you will find bounding boxes of objects detected by the model.
The sample image (image
) and pipeline module (pipeline
) have been loaded.
This exercise is part of the course
Multi-Modal Models with Hugging Face
Exercise instructions
- Load the
object-detection
pipeline withfacebook/detr-resnet-50
pretrained model. - Find the
label
of the detected object. - Find the associated confidence
score
of the detected object. - Find the bounding
box
coordinates of the detected object.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the object-detection pipeline
pipe = pipeline("____", "____", revision="no_timm")
pred = pipe(image)
outputs = pipe(image)
for obj in outputs:
# Find the detected label
label = ____
# Find the confidence score of the prediction
confidence = ____
# Obtain the bounding box coordinates
box = ____
plot_args = {"linewidth": 1, "edgecolor": colors[n], "facecolor": 'none'}
rect = patches.Rectangle((box['xmin'], box['ymin']), box['xmax']-box['xmin'], box['ymax']-box['ymin'], **plot_args)
ax.add_patch(rect)
print(f"Detected {label} with confidence {confidence:.2f} at ({box['xmin']}, {box['ymin']}) to ({box['xmax']}, {box['ymax']})")
plt.show()