ComeçarComece de graça

Audio denoising

In this exercise, you will use data from the WHAM dataset, which mixes speech with background noise, to generate new speech in a different voice and with the background noise removed!

Spectrogram of noisy speech

The example_speech array and speaker_embedding vector of the new voice have already been loaded. The preprocessor (processor) and vocoder (vocoder) are also available, along with the SpeechT5ForSpeechToSpeech module. A make_spectrogram() function has been provided to aid with plotting.

Este exercício faz parte do curso

Multi-Modal Models with Hugging Face

Ver curso

Instruções do exercício

  • Load the SpeechT5ForSpeechToSpeech pretrained model using the microsoft/speecht5_vc checkpoint.
  • Preprocess example_speech with a sampling rate of 16000.
  • Generate the denoised speech using the .generate_speech().

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Load the SpeechT5ForSpeechToSpeech pretrained model
model = ____

# Preprocess the example speech
inputs = ____(audio=____, sampling_rate=____, return_tensors="pt")

# Generate the denoised speech
speech = ____

make_spectrogram(speech)
sf.write("speech.wav", speech.numpy(), samplerate=16000)
Editar e executar o código