Audio files are different from most other types of data. Before you can start working with them, they require some preprocessing. In this chapter, you'll learn the first steps to working with speech files by converting two different audio files into soundwaves and comparing them visually.

Introduction to audio data in Python

The right frequency

Importing an audio file with Python

Converting sound wave bytes to integers

The right data type

Bytes to integers

Finding the time stamps

Visualizing sound waves

Staying consistent

Processing audio data with Python

Introduction to Spoken Language Processing with Python

Speech recognition is still far from perfect. But the SpeechRecognition library provides an easy way to interact with many speech-to-text APIs. In this section, you'll learn how to use the SpeechRecognition library to easily start converting the spoken language in your audio files to text.

SpeechRecognition Python library

Pick the wrong speech_recognition API

Using the SpeechRecognition library

Using the Recognizer class

Reading audio files with SpeechRecognition

From AudioFile to AudioData

Recording the audio we need

Dealing with different kinds of audio

Different kinds of audio

Multiple Speakers 1

Multiple Speakers 2

Working with noisy audio

Using the Python SpeechRecognition library

Not all audio files come in the same shape, size or format. Luckily, the PyDub library by James Robert provides tools which you can use to programmatically alter and change different audio file attributes such as frame rate, number of channels, file format and more. In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.

Introduction to PyDub

Import an audio file with PyDub

Play an audio file with PyDub

Audio parameters with PyDub

Adjusting audio parameters

Manipulating audio files with PyDub

Turning it down... then up

Normalizing an audio file with PyDub

Chopping and changing audio files

Splitting stereo audio to mono with PyDub

Converting and saving audio files with PyDub

Exporting and reformatting audio files

Manipulating multiple audio files with PyDub

An audio processing workflow

Manipulating Audio Files with PyDub

In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.

Creating transcription helper functions

Converting audio to the right format

Finding PyDub stats

Transcribing audio with one line

Using the helper functions you've built

Sentiment analysis on spoken language text

Analyzing sentiment of a phone call

Sentiment analysis on formatted text

Named entity recognition on transcribed text

Named entity recognition in spaCy

Creating a custom named entity in spaCy

Classifying transcribed speech with Sklearn

Preparing audio files for text classification

Transcribing phone call excerpts

Organizing transcribed phone call data

Create a spoken language text classifier

Congratulations!

Processing text transcribed from spoken language

Pre- and post-purchase audio snippet transcriptions

We learn to speak far before we learn to read. Even in the digital age, our main method of communication is speech. Spoken Language Processing with Python will help you load, transform and transcribe audio files. You'll start by seeing what raw audio looks like in Python. And then finish by working through an example business use case, transcribing and classifying phone call data.

<h2>Learn Speech Recognition and Spoken Language Processing in Python</h2>
We learn to speak far before we learn to read. Even in the digital age, our main method of communication is speech. Spoken Language Processing in Python will help you load, transform, and transcribe audio files. You’ll start by seeing what raw audio looks like in Python, and move on to exploring popular libraries and working through an example business use case. 
<br><br>
<h2>Use Python SpeechRecognition and PyDub to Transcribe Audio Files</h2>
Python has a number of popular libraries that help you to process spoken language. SpeechRecognition offers you an easy way to integrate with speech-to-text APIs, while PyDub helps you to programmatically alter audio file attributes to get them ready for transcription. Each of these libraries is covered in an in-depth chapter, offering you the opportunity to put theory into practice to cement your knowledge. 
<br><br>
<h2>Practice Speech Transcription with an In-Course Project</h2>
The final chapter in this course offers you the opportunity to put everything you’ve learned together by building a speech processing proof of concept for a fictional technology company. You’ll build a system that transcribes phone call audio to text and then performs sentiment analysis to review customer support phone calls. 
<br><br>
By the end of this course, you’ll have both the knowledge and hands-on experience to put your learning into practice within your job or personal projects. 

Introduction to Natural Language Processing in Python

Supervised Learning with scikit-learn

Learn how to load, transform, and transcribe speech from raw audio files in Python, using popular libraries.

Spoken Language Processing in Python Course with Speech Recognition | DataCamp

Learn how to load, transform, and transcribe speech from raw audio files in Python. 

Spoken Language Processing in Python

Natural Language Processing in Python

Likely to Recommend

Splitting stereo audio to mono with PyDub

“Spoken Language Processing in Python”

Exercise instructions

Hands-on interactive exercise

Spoken Language Processing in Python

Chapter 1: Introduction to Spoken Language Processing with Python

Chapter 2: Using the Python SpeechRecognition library

Chapter 3: Manipulating Audio Files with PyDub

Chapter 4: Processing text transcribed from spoken language

What is DataCamp?