Get Started

Introduction to PyDub

1. Introduction to PyDub

As you know, a big part of working with data, especially audio files, is ensuring it's all in a consistent format. PyDub is a Python library made by James Robert which provides a gold mine of tools for manipulating audio files. Becoming familiar with PyDub will give you a programmatic way to ensure your audio files are consistent and in an ideal format for transcription locally or through an API.

2. Installing PyDub

You can install PyDub via pip, by running pip install PyDub on the command line. If you're working with only wav files, PyDub works out of the box. However, for file formats like mp3, you'll need ffmpeg, an open source audio library, which can be installed via ffmpeg dot org.

3. PyDub's main class, AudioSegment

Once you've installed PyDub, you'll find all of its functionality is built on one main class, AudioSegment. To use it, we import it using from pydub import AudioSegment. Then we can use AudioSegment and the from file method, to import an audio file. The from file method requires the argument file, which takes a string containing an audio file's file path. In our case, wav file dot wav. The format parameter takes the target audio file's file format but is optional as it gets inferred from the audio file name. Remember, for file types other than wav, you'll need ffmpeg. Running this will create an instance of AudioSegment called wav file of type pydub audio segment. You'll see soon how useful this class is.

4. Playing an audio file

Let's say you wanted to play an audio file to check its quality, you can use the play function on any AudioSegment. The play function requires simpleaudio or pyaudio for wav files and ffmpeg for all others. Since ours is a wav file, we'll install simpleaudio via pip. Then we import play from pydub dot playback. And to play our AudioSegment instance variable, wav_file, we pass it to the play function. Running the play function will play wav_file out loud. Note, due to limitations of the DataCamp classroom, the play function does not work on DataCamp but will work locally.

5. Audio parameters

When you import a file with from file, PyDub automatically infers a number of parameters about the file. These are stored as attributes in the AudioSegment instance. For example, calling channels on AudioSegment will show you the number of channels, 1 for mono, 2 for stereo audio. Calling frame rate gives you the sample of your AudioSegment in Hertz.

6. Audio parameters

sample width tells you the number of bytes per sample. 1 means 8-bit, 2 means 16-bit. max will tell you the max amplitude of your audio file, which can be considered loudness and is useful for normalizing sound levels.

7. Audio parameters

Finally, calling len on any AudioSegment will tell you the duration of the audio file in milliseconds.

8. Changing audio parameters

Having these parameters readily available is helpful to ensure all of your audio files are consistent. You can adjust them using set attribute name style functions like set sample width to adjust the sample width.

9. Changing audio parameters

Or set frame rate to change the sample rate. And set channels to alter the number of channels. Some APIs require your audio files to have certain values for these parameters. A rule of thumb is the higher the values, excluding channels, the better. You should aim for a minimum of 16,000 Hertz as the frame rate and to have your audio files in wav format. We'll see how to convert audio files using PyDub in a later lesson.

10. Let's practice!

For now, let's practice importing and altering some audio files!