Create a spoken language text classifier

Now you've transcribed some customer call audio data, we'll build a model to classify whether the text from the customer call is pre_purchase or post_purchase.

We've got 45 examples of pre_purchase calls and 57 examples of post_purchase calls.

The data the model will train on is stored in train_df and the data the model will predict on is stored in test_df.

Try printing the .head() of each of these to the console.

We'll build an sklearn pipeline using CountVectorizer() and TfidfTransformer() to convert our text samples to numbers and then use a MultinomialNB() classifier to learn what category each sample belongs to.

This model will work well on our small example here but for larger amounts of text, you may want to consider something more sophisticated.

Este exercício faz parte do curso

Spoken Language Processing in Python

Ver curso

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Build the text_classifier as an sklearn pipeline
text_classifier = Pipeline([
    ('vectorizer', ____),
    ('tfidf', ____),
    ('classifier', ____),
])

# Fit the classifier pipeline on the training data
text_classifier.fit(____, ____)

Editar e executar o código