Get startedGet started for free

Create a spoken language text classifier

Now you've transcribed some customer call audio data, we'll build a model to classify whether the text from the customer call is pre_purchase or post_purchase.

We've got 45 examples of pre_purchase calls and 57 examples of post_purchase calls.

The data the model will train on is stored in train_df and the data the model will predict on is stored in test_df.

Try printing the .head() of each of these to the console.

We'll build an sklearn pipeline using CountVectorizer() and TfidfTransformer() to convert our text samples to numbers and then use a MultinomialNB() classifier to learn what category each sample belongs to.

This model will work well on our small example here but for larger amounts of text, you may want to consider something more sophisticated.

This exercise is part of the course

Spoken Language Processing in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build the text_classifier as an sklearn pipeline
text_classifier = Pipeline([
    ('vectorizer', ____),
    ('tfidf', ____),
    ('classifier', ____),
])

# Fit the classifier pipeline on the training data
text_classifier.fit(____, ____)
Edit and Run Code