Aan de slagGa gratis aan de slag

Audio denoising

In this exercise, you will use data from the WHAM dataset, which mixes speech with background noise, to generate new speech in a different voice and with the background noise removed!

Spectrogram of noisy speech

The example_speech array and speaker_embedding vector of the new voice have already been loaded. The preprocessor (processor) and vocoder (vocoder) are also available, along with the SpeechT5ForSpeechToSpeech module. A make_spectrogram() function has been provided to aid with plotting.

Deze oefening maakt deel uit van de cursus

Multi-Modal Models with Hugging Face

Cursus bekijken

Oefeninstructies

  • Load the SpeechT5ForSpeechToSpeech pretrained model using the microsoft/speecht5_vc checkpoint.
  • Preprocess example_speech with a sampling rate of 16000.
  • Generate the denoised speech using the .generate_speech().

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Load the SpeechT5ForSpeechToSpeech pretrained model
model = ____

# Preprocess the example speech
inputs = ____(audio=____, sampling_rate=____, return_tensors="pt")

# Generate the denoised speech
speech = ____

make_spectrogram(speech)
sf.write("speech.wav", speech.numpy(), samplerate=16000)
Code bewerken en uitvoeren