Create a spoken language text classifier

Now you've transcribed some customer call audio data, we'll build a model to classify whether the text from the customer call is pre_purchase or post_purchase.

We've got 45 examples of pre_purchase calls and 57 examples of post_purchase calls.

The data the model will train on is stored in train_df and the data the model will predict on is stored in test_df.

Try printing the .head() of each of these to the console.

We'll build an sklearn pipeline using CountVectorizer() and TfidfTransformer() to convert our text samples to numbers and then use a MultinomialNB() classifier to learn what category each sample belongs to.

This model will work well on our small example here but for larger amounts of text, you may want to consider something more sophisticated.

Create text_classifier using CountVectorizer(), TfidfTransformer(), and MultinomialNB().
Fit text_classifier on train_df.text and train_df.label.

Introduction to Spoken Language Processing with Python

Using the Python SpeechRecognition library

Manipulating Audio Files with PyDub

Processing text transcribed from spoken language

Exercise

Create a spoken language text classifier

Instructions 1/2