Build and assess a model: movies reviews
In this problem, you will build a logistic regression model using the movies
dataset. The score is stored in the label
column and is 1
when the review is positive, and 0
when negative. The text review has been transformed, using BOW, to numeric columns.
You have already built a classifier but evaluated it using the same data employed in the training step. Make sure you now assess the model using an unseen test dataset. How does the performance of the model change when evaluated on the test set?
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Import the function required for a train/test split.
- Perform the train/test split, specifying that 20% of the data should be used as a test set.
- Train a logistic regression model.
- Print out the accuracy of the model on the training and on the testing data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the required packages
from sklearn.linear_model import LogisticRegression
____
# Define the vector of labels and matrix of features
y = movies.label
X = movies.drop('label', axis=1)
# Perform the train-test split
X_train, X_test, y_train, y_test = ____(X, y, ____=0.2, random_state=42)
# Build a logistic regression model and print out the accuracy
log_reg = ____.____
print('Accuracy on train set: ', log_reg.____(____, ____))
print('Accuracy on test set: ', log_reg.____(____, ____))