Audio denoising
In this exercise, you will use data from the WHAM dataset, which mixes speech with background noise, to generate new speech in a different voice and with the background noise removed!

The example_speech array and speaker_embedding vector of the new voice have already been loaded. The preprocessor (processor) and vocoder (vocoder) are also available, along with the SpeechT5ForSpeechToSpeech module. A make_spectrogram() function has been provided to aid with plotting.
Este exercício faz parte do curso
Multi-Modal Models with Hugging Face
Instruções do exercício
- Load the
SpeechT5ForSpeechToSpeechpretrained model using themicrosoft/speecht5_vccheckpoint. - Preprocess
example_speechwith a sampling rate of16000. - Generate the denoised speech using the
.generate_speech().
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Load the SpeechT5ForSpeechToSpeech pretrained model
model = ____
# Preprocess the example speech
inputs = ____(audio=____, sampling_rate=____, return_tensors="pt")
# Generate the denoised speech
speech = ____
make_spectrogram(speech)
sf.write("speech.wav", speech.numpy(), samplerate=16000)