Create a spoken language text classifier
Now you've transcribed some customer call audio data, we'll build a model to classify whether the text from the customer call is pre_purchase
or post_purchase
.
We've got 45 examples of pre_purchase
calls and 57 examples of post_purchase
calls.
The data the model will train on is stored in train_df
and the data the model will predict on is stored in test_df
.
Try printing the .head()
of each of these to the console.
We'll build an sklearn pipeline
using CountVectorizer()
and TfidfTransformer()
to convert our text samples to numbers and then use a MultinomialNB()
classifier to learn what category each sample belongs to.
This model will work well on our small example here but for larger amounts of text, you may want to consider something more sophisticated.
This exercise is part of the course
Spoken Language Processing in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build the text_classifier as an sklearn pipeline
text_classifier = Pipeline([
('vectorizer', ____),
('tfidf', ____),
('classifier', ____),
])
# Fit the classifier pipeline on the training data
text_classifier.fit(____, ____)