Generating new speech
Time to complete your mastery of using Hugging Face audio models! You'll use a fine-tuned model to generate new speech for a given voice. You will choose a voice from the VCTK Corpus as the basis for the new audio.
The dataset
and SpeechT5ForTextToSpeech
model (model
) have already been loaded, and a make_spectogram()
function has been provided to aid with plotting.
Diese Übung ist Teil des Kurses
Multi-Modal Models with Hugging Face
Anleitung zur Übung
- Load a sample speaker embedding from index
5
of the testdataset
. - Generate the speech from the processed text by specifying the
inputs
,speaker_embedding
, andvocoder
.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
text = "Hi, welcome to your new voice."
# Load a speaker embedding from the dataset
speaker_embedding = torch.tensor(dataset[5]["____"]).unsqueeze(0)
inputs = processor(text=text, return_tensors="pt")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# Generate speech
speech = model.generate_speech(____["input_ids"], ____, ____=____)
make_spectrogram(speech)