Aangepaste bewerkingen op afbeeldingen

AI-afbeeldingen genereren is al behoorlijk gaaf, maar sommige modellen ondersteunen zelfs aangepaste beeldbewerking: een multimodale variant van beeldgeneratie die zowel een tekstprompt als een bronafbeelding gebruikt. Probeer dit beroemde zelfportret van Van Gogh aan te passen zodat het de tekenfilmfiguur Snoopy wordt met de StableDiffusionControlNetPipeline:

Famous Van Gogh painting

Opmerking: Inferentie op diffusion-modellen kan lang duren, dus we hebben de gegenereerde afbeelding alvast voor je geladen. Het uitvoeren van andere prompts levert geen nieuwe afbeeldingen op.

De Canny-filterversie van de afbeelding is al voor je gemaakt (canny_image). De klassen StableDiffusionControlNetPipeline en ControlNetModel zijn geïmporteerd uit de diffusers-bibliotheek. De generatorlijst (generator) is aangemaakt.

Deze oefening maakt deel uit van de cursus

Multi-modale modellen met Hugging Face

Cursus bekijken

Oefeninstructies

Laad het ControlNetModel vanaf de lllyasviel/sd-controlnet-canny checkpoint.
Laad de StableDiffusionControlNetPipeline vanaf de runwayml/stable-diffusion-v1-5 checkpoint en geef de meegeleverde controlnet door.
Voer de pipeline uit met de prompt, canny_image, en de meegeleverde negative_prompt en generator.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

## NOTE: no imports are required for this exercise
# Load a ControlNetModel from the pretrained checkpoint
controlnet = ____("____", torch_dtype=torch.float16)

# Load a pretrained StableDiffusionControlNetPipeline using the ControlNetModel
pipe = ____(
    "____", controlnet=____, torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = ["Snoopy, best quality, extremely detailed"]

# Run the pipeline
output = pipe(
    ____,
    ____,
    negative_prompt=["monochrome, lowres, bad anatomy, worst quality, low quality"],
    generator=____,
    num_inference_steps=20,
)

plt.imshow(output.images[0])
plt.show()

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Multi-modale modellen met Hugging Face

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Navigate the Hugging Face model hub, transform raw text, audio, and visual data into AI-friendly formats. Learn how to find the latest most popular models for tasks such as text generation and harness the power of pre-built pipelines.

Exercise 1: Hugging Face model navigation Exercise 2: How many models!?Exercise 3: Finding the most popular text-to-image model Exercise 4: Preprocessing different modalities Exercise 5: Text tokenizing Exercise 6: Image preprocessing Exercise 7: Audio preprocessing Exercise 8: Pipeline tasks and evaluations Exercise 9: Pipeline caption generation Exercise 10: Passing keyword arguments Exercise 11: Model evaluation on a custom dataset

Learn to master individual modalities with state-of-the-art models. Dive into computer vision for image classification and segmentation, explore speech recognition and text-to-speech synthesis, and learn effective fine-tuning techniques. Build practical skills with pre-trained models from Hugging Face's transformers library.

Exercise 1: Computer vision Exercise 2: Image classification Exercise 3: Object detection Exercise 4: Image background removal Exercise 5: Fine-tuning computer vision models Exercise 6: CV fine-tuning: dataset prep Exercise 7: CV fine-tuning: model classes Exercise 8: CV fine-tuning: trainer configuration Exercise 9: Speech recognition and audio generation Exercise 10: Automatic speech recognition Exercise 11: Creating speech embeddings Exercise 12: Audio denoising Exercise 13: Fine-tuning text-to-speech models Exercise 14: Fine-tuning a text-to-speech model Exercise 15: Generating new speech

Learn to fuse visual, textual, and audio information for richer AI applications. Master techniques like CLIP for zero-shot classification, build sentiment analyzers that see and read, and create emotion detectors that combine facial expressions with voice. Take your AI models beyond single-modality thinking.

Exercise 1: Zero-shot image classification Exercise 2: Zero-shot learning with CLIP Exercise 3: Automated caption quality assessment Exercise 4: Multi-modal sentiment analysis Exercise 5: Prompting Vision Language Models (VLMs)Exercise 6: Multi-modal sentiment classification with Qwen Exercise 7: Zero-shot video classification Exercise 8: Video audio splitting Exercise 9: Video sentiment analysis with CLIP CLAP

Transform ideas into reality! Master cutting-edge AI techniques to generate and manipulate visual content using text prompts. Create stunning images, edit photos intelligently, and build powerful question-answering systems for images and documents. Turn your creative vision into digital reality with multi-modal AI.

Exercise 1: Visual question answering (VQA)Exercise 2: VQA met Vision Language Transformers (ViLTs)Exercise 3: Document-VQA met LayoutLM Exercise 4: Afbeeldingen bewerken met diffusiemodellen Exercise 5: Aangepaste bewerkingen op afbeeldingen

Huidige oefening

Exercise 6: Image inpainting Exercise 7: Video genereren Exercise 8: Bouw een video!Exercise 9: Prestaties van videogeneratie beoordelen Exercise 10: Gefeliciteerd!