Predicting the sentiment of a movie review
In the previous exercise, you generated the bag-of-words representations for the training and test movie review data. In this exercise, we will use this model to train a Naive Bayes classifier that can detect the sentiment of a movie review and compute its accuracy. Note that since this is a binary classification problem, the model is only capable of classifying a review as either positive (1) or negative (0). It is incapable of detecting neutral reviews.
In case you don't recall, the training and test BoW vectors are available as X_train_bow
and X_test_bow
respectively. The corresponding labels are available as y_train
and y_test
respectively. Also, for you reference, the original movie review dataset is available as df
.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Instantiate an object of
MultinomialNB
. Name itclf
. - Fit
clf
usingX_train_bow
andy_train
. - Measure the accuracy of
clf
usingX_test_bow
andy_test
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a MultinomialNB object
clf = ____
# Fit the classifier
clf.____(____, ____)
# Measure the accuracy
accuracy = clf.score(____, ____)
print("The accuracy of the classifier on the test set is %.3f" % accuracy)
# Predict the sentiment of a negative review
review = "The movie was terrible. The music was underwhelming and the acting mediocre."
prediction = clf.predict(vectorizer.transform([review]))[0]
print("The sentiment predicted by the classifier is %i" % (prediction))