Get startedGet started for free

Build and assess a model: movies reviews

In this problem, you will build a logistic regression model using the movies dataset. The score is stored in the label column and is 1 when the review is positive, and 0 when negative. The text review has been transformed, using BOW, to numeric columns.

You have already built a classifier but evaluated it using the same data employed in the training step. Make sure you now assess the model using an unseen test dataset. How does the performance of the model change when evaluated on the test set?

This exercise is part of the course

Sentiment Analysis in Python

View Course

Exercise instructions

  • Import the function required for a train/test split.
  • Perform the train/test split, specifying that 20% of the data should be used as a test set.
  • Train a logistic regression model.
  • Print out the accuracy of the model on the training and on the testing data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the required packages
from sklearn.linear_model import LogisticRegression
____

# Define the vector of labels and matrix of features
y = movies.label
X = movies.drop('label', axis=1)

# Perform the train-test split
X_train, X_test, y_train, y_test = ____(X, y, ____=0.2, random_state=42)

# Build a logistic regression model and print out the accuracy
log_reg = ____.____
print('Accuracy on train set: ', log_reg.____(____, ____))
print('Accuracy on test set: ', log_reg.____(____, ____))
Edit and Run Code