LoslegenKostenlos loslegen

Generating new speech

Time to complete your mastery of using Hugging Face audio models! You'll use a fine-tuned model to generate new speech for a given voice. You will choose a voice from the VCTK Corpus as the basis for the new audio.

The dataset and SpeechT5ForTextToSpeech model (model) have already been loaded, and a make_spectogram() function has been provided to aid with plotting.

Diese Übung ist Teil des Kurses

Multi-Modal Models with Hugging Face

Kurs anzeigen

Anleitung zur Übung

  • Load a sample speaker embedding from index 5 of the test dataset.
  • Generate the speech from the processed text by specifying the inputs, speaker_embedding, and vocoder.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

text = "Hi, welcome to your new voice."

# Load a speaker embedding from the dataset
speaker_embedding = torch.tensor(dataset[5]["____"]).unsqueeze(0)

inputs = processor(text=text, return_tensors="pt")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Generate speech
speech = model.generate_speech(____["input_ids"], ____, ____=____)

make_spectrogram(speech)
Code bearbeiten und ausführen