1. Manipulating audio files with PyDub
Now you've had a little experience with PyDub and the AudioSegment class, in this lesson, we'll start to see just how powerful it is.
2. Turning it down to 11
Are your audio files too loud or too quiet? You can make your AudioSegments louder or quieter by adding or subtracting integers.
Let's make our wav file 60 decibels quieter.
You'll see if you tried to transcribe audio this quiet with recognize google as we saw in an earlier lesson, it would return an error.
3. Increasing the volume
In practice, you're more likely to want to increase the volume of your AudioSegments.
You can do this by adding an integer. This will increase your AudioSegment's average volume level by the same number of decibels.
If your audio files are too quiet or too loud, they may produce transcription errors. As you could imagine, speech transcription works best on clear, audible speech. If you can't hear it, chances are, a speech recognition system can't either.
4. This all sounds the same
Some audio files might differ in loudness throughout. They might begin quiet and then increase in sound as a person gets comfortable talking or adjusts the microphone.
The normalize function is great for taking care of this. It finds the highest level of audio throughout an AudioSegment and then boosts the rest of the audio up to match.
You can import the normalize function via the PyDub effects module. Then to even out the sound levels in an AudioSegment, you pass it to the normalize function.
You can check the sound using the play function.
Ensuring your audio file is the same loudness throughout can help with transcription.
5. Remixing your audio files
Another handy feature of AudioSegments is that they are sliceable and combinable. This is helpful if you need to cut your audio files down or combine them in some way.
Let's say you knew your audio files had 5-seconds of static at the beginning and you didn't want to waste compute power trying to transcribe the static.
You could use slicing to remove the first 5-seconds of audio. Since AudioSegments are measured in milliseconds, you would do this by only keeping everything after 5000.
And then the new AudioSegment won't contain the 5-seconds of static.
6. Remixing your audio files
Or what if your audio file came in separate parts?
Due to length issues or a broken recording?
You can easily add two AudioSegments together using the addition operator.
Operators on AudioSegments work in order of operation. So wav file 1 plus wav file 2 plus 10 will combine wav file 1 and 2 and increase the combination by 10 decibels.
If your audio files have different characteristics, combining them like this automatically scales parameters such as frame rate to be equal to the higher quality audio file.
7. Splitting your audio
You saw in a previous lesson, the issue of transcribing multiple speakers on one audio file.
Well, let's say you were trying to transcribe phone calls and using PyDub, you found your audio files are recorded in stereo format, two channels.
PyDub allows for a stereo AudioSegment to split into two mono, single channel, AudioSegments using the split to mono function. Calling this returns a list containing each channel.
8. Splitting your audio
Because each of these is an AudioSegment, you can use all of the functionality you've seen previously on them.
And as long as your speakers have been recorded on separate channels, you can now transcribe their audio individually.
9. Let's code!
Okay, I bet you're now starting to realise how helpful PyDub can be for working with your audio and speech files. Before we go further, let's get hands-on!