1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Generating new speech

Time to complete your mastery of using Hugging Face audio models! You'll use a fine-tuned model to generate new speech for a given voice. You will choose a voice from the VCTK Corpus as the basis for the new audio.

The dataset and SpeechT5ForTextToSpeech model (model) have already been loaded, and a make_spectogram() function has been provided to aid with plotting.

Instructions

100 XP
  • Load a sample speaker embedding from index 5 of the test dataset.
  • Generate the speech from the processed text by specifying the inputs, speaker_embedding, and vocoder.