Get startedGet started for free

SpeechRecognition Python library

1. SpeechRecognition Python library

To get started with spoken language recognition, let's check out the SpeechRecognition Python Library. We'll start with why the SpeechRecognition Library. And then we'll get into seeing how we can use Google's web speech API to transcribe speech to text.

2. Why the SpeechRecognition library?

Automatic speech recognition is a tough challenge. And there's no shortage of companies and research institutions working on libraries to help solve it. There's the Sphinx library by Carnegie Mellon University, Kaldi, SpeechRecognition, and more. Some have more robust features than others but they all have the same goal of transcribing audio files to text. We're going to be focused on the SpeechRecognition library because of its low barrier to entry and its compatibility with many available speech recognition APIs we'll see shortly.

3. Getting started with SpeechRecognition

We can get started with the SpeechRecognition library by installing it from PyPi using pip and running the pip install SpeechRecognition command in a terminal or shell. It's compatible with Python 2 and 3 but we'll be using Python 3.

4. Using the Recognizer class

Now we have SpeechRecognition installed, let's check out where all the magic happens, the recognizer class. So how do we use it? To access the Recognizer class, we'll first import the SpeechRecognition module as the abbreviation sr. Then we'll create an instance of the recognizer class by calling it from sr and assigning to a variable, recognizer. Finally, we'll set the recognizers energy threshold to 300. The energy threshold can be thought of as the loudness of audio which is considered speech. Values below the threshold are considered silence, values above are considered speech. A silent room is typically between 0 and 100. SpeechRecognition's documentation recommends 300 as a starting value which covers most speech files. The energy threshold value will adjust automatically as the recognizer listens to an audio file.

5. Using the Recognizer class to recognize speech

Now we've got a recognizer instance ready, it's time to recognize some speech. We chose SpeechRecognition for its flexibility. Here's what I mean. SpeechRecognition has functions built-in to work with many of the best speech recognition APIs. Recognize bing accesses Microsoft's cognitive services, recognize Google uses Google's free web speech API, recognize Google Cloud accesses Google's cloud speed API. And recognize wit uses the wit dot ai platform. They all accept an audio file and return text, which is hopefully the transcribed speech from the audio file. Remember, speech recognition is still far from perfect.

6. SpeechRecognition Example

We'll be using the recognize google function since it's free and doesn't require an API key. However, this limits us to 50 requests per day and if our audio files are too long, it may time out. In my experience, I've had no issues with audio files under 5-minutes. So if you have more audio files or long audio files, you may want to look into one of the paid API services. Let's put everything together with an example. We'll start by importing the speech recognition library as sr. Then we'll initialize a recognizer class. Finally we call recognize google which takes the required parameter audio data. We can also pass it the language our audio file is in. The default language is US English. We're using a mocked version of recognize google for this course so we don't go over the API limit. Running the function returns the speech detected in the audio file as text.

7. Your turn!

Now you've seen a starter example of the SpeechRecognition library, it's time to try it out for yourself!