Build and assess a model: product reviews data
In this exercise, you will build a logistic regression using the reviews
dataset, containing customers' reviews of Amazon products. The array y
contains the sentiment : 1
if positive and 0
otherwise. The array X
contains all numeric features created using a BOW approach. Feel free to explore them in the IPython Shell.
Your task is to build a logistic regression model and calculate the accuracy and confusion matrix using the test dataset.
The logistic regression and train/test splitting functions have been imported for you.
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Import the accuracy score and confusion matrix functions.
- Split the data into training and testing, using 30% of it as a test set and set the random seed to
42
. - Train a logistic regression model.
- Print out the accuracy score and confusion matrix using the test data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the accuracy and confusion matrix
____
# Split the data into training and testing
X_train, X_test, y_train, y_test = ____(____, ____, ____=0.3, ____=42)
# Build a logistic regression
log_reg = ____._____
# Predict the labels
y_predict = log_reg.predict(X_test)
# Print the performance metrics
print('Accuracy score of test data: ', ____(____, ____))
print('Confusion matrix of test data: \n', ____(____, ____)/len(y_test))