1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Build a video!

Time for you to have a go at creating a video entirely from a text prompt! You'll use a CogVideoXPipeline pipeline and the following prompt to guide the generation:

A robot doing the robot dance. The dance floor has colorful squares and a glitterball.

Note: Inference on video generation models can take a long time, so we've pre-loaded the generated video for you. Running different prompts will not generated new videos.

The CogVideoXPipeline class has already been imported for you.

Instructions

100 XP
  • Create a CogVideoXPipeline from the THUDM/CogVideoX-2b checkpoint.
  • Run the pipeline with the provided prompt, setting the number of inference steps to 20, the number of frames to generate to 20, and the guidance scale to 6.