오디오 데이터셋 전처리

정밀 농업 애플리케이션을 개선하여 농부들이 음성 명령으로 기계를 제어할 수 있도록 기능을 추가하고 있습니다. 이 시스템은 "Turn on the sprinkler irrigation system."과 같은 명령에서 키워드를 인식해야 합니다.

"on"과 같은 키워드 오디오 클립이 포함된 키워드 인식 데이터셋을 활용할 것입니다. 사전 학습된 Transformer 모델에 사용할 수 있도록 오디오 파일을 전처리해 보세요!

다음 데이터는 미리 불러와져 있습니다:

dataset에는 오디오 파일 샘플 학습 데이터셋이 포함되어 있습니다. 이미 train 분할이 포함되어 있으므로 dataset 사용 시 train을 별도로 지정할 필요가 없습니다.
AutoFeatureExtractor는 transformers에서 임포트되어 있습니다.
model은 facebook/wav2vec2-base로 설정되어 있습니다.
max_duration은 1초로 정의되어 있습니다.

이 연습은 강의의 일부입니다

PyTorch로 AI 모델 효율적으로 학습시키기

강의 보기

연습 안내

AutoFeatureExtractor 클래스를 사용하여 사전 학습된 feature_extractor를 불러오세요.
feature_extractor의 샘플링 레이트를 사용하여 sampling_rate를 설정하세요.
max_duration을 활용하여 audio_arrays의 max_length를 설정하세요.

실습형 인터랙티브 연습

이 예제를 이 샘플 코드를 완성하여 풀어보세요.

# Load a pre-trained feature extractor
feature_extractor = ____.____(model)

def preprocess_function(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays,
        # Set the sampling rate
        sampling_rate=____.____, 
        # Set the max length
        max_length=int(feature_extractor.sampling_rate * max_duration), 
        truncation=True)
    return inputs

encoded_dataset = dataset.map(preprocess_function, remove_columns=["audio", "file"], batched=True)

코드 편집 및 실행