1. Learn
  2. /
  3. Courses
  4. /
  5. Multi-Modal Models with Hugging Face

Connected

Exercise

Audio denoising

In this exercise, you will use data from the WHAM dataset, which mixes speech with background noise, to generate new speech in a different voice and with the background noise removed!

Spectrogram of noisy speech

The example_speech array and speaker_embedding vector of the new voice have already been loaded. The preprocessor (processor) and vocoder (vocoder) are also available, along with the SpeechT5ForSpeechToSpeech module. A make_spectrogram() function has been provided to aid with plotting.

Instructions

100 XP
  • Load the SpeechT5ForSpeechToSpeech pretrained model using the microsoft/speecht5_vc checkpoint.
  • Preprocess example_speech with a sampling rate of 16000.
  • Generate the denoised speech using the .generate_speech().