Aan de slagGa gratis aan de slag

Audio preprocessing

In this exercise, you will learn how to adjust the sampling rate of audio data, as well as how to use an automatic preprocessor. You will be working with the VCTK Corpus, which includes around 44-hours of speech data uttered by 110 English speakers with various accents.

The dataset has already been loaded.

Deze oefening maakt deel uit van de cursus

Multi-Modal Models with Hugging Face

Cursus bekijken

Oefeninstructies

  • Resample the audio to a frequency of 16,000 Hz in the dataset using the .cast_column() method.
  • Load the audio processor using the pretrained openai/whisper-small model.
  • Preprocess the audio data of the first datapoint, specifying the same sampling rate and padding=True.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Resample the audio to a frequency of 16,000 Hz
dataset = dataset.____("____", ____(sampling_rate=____))

# Load the audio processor
processor = ____

# Preprocess the audio data of the 0th dataset element
audio_pp = ____(dataset[0]["audio"]["array"], sampling_rate=____, padding=True, return_tensors="pt")
make_spectrogram(audio_pp["input_features"][0])
Code bewerken en uitvoeren