Product reviews with regularization
In this exercise, you will work once more with the reviews
dataset of Amazon product reviews. A vector of labels y
contains the sentiment : 1
if positive and 0
otherwise. The matrix X
contains all numeric features created using a BOW approach.
You will need to train two logistic regression models with different levels of regularization and compare how they perform on the test data. Remember that regularization is a way to control the complexity of the model. The more regularized a model is, the less flexible it is but the better it can generalize. Models with higher level of regularization are often less accurate than non-regularized ones.
This exercise is part of the course
Sentiment Analysis in Python
Exercise instructions
- Split the data into a train and test sets.
- Train a logistic regression with regularization parameter of
1000
. Train a second logistic regression with regularization parameter equal to0.001
. - Print the accuracy scores of both models on the test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split data into training and testing
____, ____, ____, ____ = train_test_split(____, ____, test_size=0.2, random_state=123)
# Train a logistic regression with regularization of 1000
log_reg1 = ____(____=1000).fit(X_train, y_train)
# Train a logistic regression with regularization of 0.001
log_reg2 = ____(____=0.001).fit(X_train, y_train)
# Print the accuracies
print('Accuracy of model 1: ', log_reg1.____(____, ____))
print('Accuracy of model 2: ', log_reg2.____(____, ____))