1. Learn
  2. /
  3. Courses
  4. /
  5. Sentiment Analysis in Python

Connected

Exercise

Step 3: Building a classifier

This is the last step in the sentiment analysis prediction. We have explored and enriched our dataset with features related to the sentiment, and created numeric vectors from it.

You will use the dataset that you built in the previous steps. Namely, it contains a feature for the length of reviews, and 200 features created with the Tfidf vectorizer.

Your task is to train a logistic regression to predict the sentiment. The data has been imported for you and is called reviews_transformed. The target is called score and is binary : 1 when the product review is positive and 0 otherwise.

Train a logistic regression model and evaluate its performance on the test data. How well does the model do?

All the required packages have been imported for you.

Instructions

100 XP
  • Perform the train/test split, allocating 20% of the data to testing and setting the random seed to 456.
  • Train a logistic regression model.
  • Predict the class.
  • Print out the accuracy score and the confusion matrix on the test set.