Assessing video generation performance
You can assess the performance of your video generation pipelines using a multi-modal CLIP model, which tests the similarity between each video frame image and the prompt. You will use this to assess just how well your generated video from the previous exercise matches the prompt.
The load_video()
function has been imported from diffusers.utils
for you. The clip_score
module has also been imported from torchmetrics
.
Diese Übung ist Teil des Kurses
Multi-Modal Models with Hugging Face
Anleitung zur Übung
- Set up a CLIP scoring function called
clip_score_fn()
from theclip_score()
metric. - Calculate the CLIP score between each frame tensor in
frame_tensors
andprompt
.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Setup CLIP scoring
clip_score_fn = partial(____, model_name_or_path="openai/clip-vit-base-patch32")
frame_tensors = []
for frame in frames:
frame = np.array(frame)
frame_int = (frame * 255).astype("uint8")
frame_tensor = torch.from_numpy(frame_int).permute(2, 0, 1)
frame_tensors.append(frame_tensor)
# Pass a list of CHW tensors as expected by clip_score
scores = clip_score_fn(____, [____] * len(frame_tensors)).detach().cpu().numpy()
avg_clip_score = round(np.mean(scores), 4)
print(f"Average CLIP score: {avg_clip_score}")