LoslegenKostenlos loslegen

Multiple Speakers 2

Deciphering between multiple speakers in one audio file is called speaker diarization. However, you've seen the free function we've been using, recognize_google() doesn't have the ability to transcribe different speakers.

One way around this, without using one of the paid speech to text services, is to ensure your audio files are single speaker.

This means if you were working with phone call data, you would make sure the caller and receiver are recorded separately. Then you could transcribe each file individually.

In this exercise, we'll transcribe each of the speakers in our multiple speakers audio file individually.

Diese Übung ist Teil des Kurses

Spoken Language Processing in Python

Kurs anzeigen

Anleitung zur Übung

  • Pass speakers to the enumerate() function to loop through the different speakers.
  • Call record() on recognizer to convert the AudioFiles into AudioData.
  • Use recognize_google() to transcribe each of the speaker_audio objects.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

recognizer = sr.Recognizer()

# Multiple speakers on different files
speakers = [sr.AudioFile("speaker_0.wav"), 
            sr.AudioFile("speaker_1.wav"), 
            sr.AudioFile("speaker_2.wav")]

# Transcribe each speaker individually
for i, speaker in enumerate(____):
    with speaker as source:
        speaker_audio = recognizer.____(source)
    print(f"Text from speaker {i}:")
    print(recognizer.____(____,
         				  language="en-US"))
Code bearbeiten und ausführen