Preprocess audio datasets
You're enhancing your precision agriculture application by enabling farmers to control their machinery with voice commands. The system should recognize keywords in commands like "Turn on the sprinkler irrigation system."
You'll leverage a keyword spotting dataset with audio clips of keywords like "on." Preprocess the audio files so they can be used with a pre-trained Transformer model!
Some data has been pre-loaded:
dataset
contains a sample training dataset of audio files. It already contains thetrain
split, so you don't need to specifytrain
when usingdataset
.AutoFeatureExtractor
has been imported fromtransformers
.model
is equal tofacebook/wav2vec2-base
.max_duration
is defined as 1 second.
This exercise is part of the course
Efficient AI Model Training with PyTorch
Exercise instructions
- Load a pre-trained
feature_extractor
with theAutoFeatureExtractor
class. - Set the
sampling_rate
using the rates from thefeature_extractor
. - Set the
max_length
of theaudio_arrays
usingmax_duration
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load a pre-trained feature extractor
feature_extractor = ____.____(model)
def preprocess_function(examples):
audio_arrays = [x["array"] for x in examples["audio"]]
inputs = feature_extractor(
audio_arrays,
# Set the sampling rate
sampling_rate=____.____,
# Set the max length
max_length=int(feature_extractor.sampling_rate * max_duration),
truncation=True)
return inputs
encoded_dataset = dataset.map(preprocess_function, remove_columns=["audio", "file"], batched=True)