Higher order n-grams for sentiment analysis
Similar to a previous exercise, we are going to build a classifier that can detect if the review of a particular movie is positive or negative. However, this time, we will use n-grams up to n=2 for the task.
The n-gram training reviews are available as X_train_ng
. The corresponding test reviews are available as X_test_ng
. Finally, use y_train
and y_test
to access the training and test sentiment classes respectively.
This exercise is part of the course
Feature Engineering for NLP in Python
Exercise instructions
- Define an instance of MultinomialNB. Name it
clf_ng
- Fit the classifier on
X_train_ng
andy_train
. - Measure
accuracy
onX_test_ng
andy_test
the usingscore()
method.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define an instance of MultinomialNB
clf_ng = ____
# Fit the classifier
clf_ng.____(____, ____)
# Measure the accuracy
accuracy = ____
print("The accuracy of the classifier on the test set is %.3f" % accuracy)
# Predict the sentiment of a negative review
review = "The movie was not good. The plot had several holes and the acting lacked panache."
prediction = clf_ng.predict(ng_vectorizer.transform([review]))[0]
print("The sentiment predicted by the classifier is %i" % (prediction))