Get Started

Reading audio files with SpeechRecognition

1. Reading audio files with SpeechRecognition

In the last lesson, we transcribed a portion of a customer support audio file. But as you'll remember from earlier lessons, audio files require a bit of preprocessing before they can be worked with.

2. The AudioFile class

Luckily, the SpeechRecognition library has a built-in class, AudioFile, along with another handy method in the Recognizer class, record. We can use these to take care of the preprocessing for us. It was done for us in the last lesson but in this lesson we'll go end-to-end. To begin, we import the SpeechRecognition library and instantiate a recognizer instance as before. Then to read in our audio file we access the AudioFile class and pass it our audio file filename and save it to a variable. In this case, our AudioFile variable is called clean support call. Now if we check the type of clean support call, we can see it's an instance of AudioFile.

3. From AudioFile to AudioData

Let's see what happens if we pass our clean support call variable to the recognize google method. It errors, stating that the audio data parameter must be of type audio data. Our clean support call variable is currently of the type AudioFile. To convert it to the audio data type we can use the recognizer class's built-in record method. Let's see it. We use a context manager, also known as with, to open and read the audio file we've saved to clean support call as source. Then we create clean support call audio using the record method and passing it source. Now before we call recognize google again, let's check the type of clean support call audio. Beautiful, it's an instance of AudioData, just what we needed.

4. Transcribing our AudioData

Now our clean support call audio is in the AudioData format, let's call recognize google and pass it our instance of audio data. Much better. Before you try it out for yourself, there are two parameters of the record method you should know about, duration and offset.

5. Duration and offset

The record method records up to duration seconds of audio from source starting at offset. They're both set to None by default. This means that by default, record will record from the beginning of the file until there is no more audio. You can change this by setting them to a float value. For example, let's say you only wanted the first 2 seconds of all your audio files, you could set duration to 2. The offset parameter can be used to cut off or skip over a specified amount of seconds at the start of an audio file. For example, if you didn't want the first 5 seconds of your audio files, you could set offset to 5. These parameters could be helpful if you knew there were parts of your audio files you didn't need. But remember, altering these parameters may cut off your audio in undesirable locations. The most ideal values will be found by experimentation. We'll see more audio file manipulation later in the course.

6. Let's practice!

Alright, enough talk, let's see speech transcription with SpeechRecognition in action!