Get startedGet started for free

Audio denoising

In this exercise, you will use data from the WHAM dataset, which mixes speech with background noise, to generate new speech in a different voice and with the background noise removed!

Spectrogram of noisy speech

The example_speech array and speaker_embedding vector of the new voice have already been loaded. The preprocessor (processor) and vocoder (vocoder) are also available, along with the SpeechT5ForSpeechToSpeech module. A make_spectrogram() function has been provided to aid with plotting.

This exercise is part of the course

Multi-Modal Models with Hugging Face

View Course

Exercise instructions

  • Load the SpeechT5ForSpeechToSpeech pretrained model using the microsoft/speecht5_vc checkpoint.
  • Preprocess example_speech with a sampling rate of 16000.
  • Generate the denoised speech using the .generate_speech().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load the SpeechT5ForSpeechToSpeech pretrained model
model = ____

# Preprocess the example speech
inputs = ____(audio=____, sampling_rate=____, return_tensors="pt")

# Generate the denoised speech
speech = ____

make_spectrogram(speech)
sf.write("speech.wav", speech.numpy(), samplerate=16000)
Edit and Run Code