1
Introduction to Spoken Language Processing with Python
Free
Audio files are different from most other types of data. Before you can start working with them, they require some preprocessing. In this chapter, you'll learn the first steps to working with speech files by converting two different audio files into soundwaves and comparing them visually.
2
Using the Python SpeechRecognition library
Speech recognition is still far from perfect. But the SpeechRecognition library provides an easy way to interact with many speech-to-text APIs. In this section, you'll learn how to use the SpeechRecognition library to easily start converting the spoken language in your audio files to text.
3
Manipulating Audio Files with PyDub
Not all audio files come in the same shape, size or format. Luckily, the PyDub library by James Robert provides tools which you can use to programmatically alter and change different audio file attributes such as frame rate, number of channels, file format and more. In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.
4
Processing text transcribed from spoken language
In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.

Initializing

Different kinds of audio

Now you've seen an example of how the Recognizer class works. Let's try a few more. How about speech from a different language?

What do you think will happen when we call the recognize_google() function on a Japanese version of good_morning.wav (file) (japanese_audio)?

The default language is "en-US", are the results the same with the "ja" tag?

How about non-speech audio? Like this leopard roaring (leopard_audio).

Or speech where the sounds may not be real words, such as a baby talking (charlie_audio)?

To familiarize more with the Recognizer class, we'll look at an example of each of these.

1
Pass the Japanese version of good morning (japanese_audio) to recognize_google() using "en-US" as the language.

2
Pass the same Japanese audio (japanese_audio) using "ja" as the language parameter. Do you see a difference?
3
What about about non-speech audio? Pass leopard_audio to recognize_google() with show_all as True.
4
What if your speech files have non-audible human sounds? Pass charlie_audio to recognize_google() to find out.