Get startedGet started for free

Product reviews with regularization

In this exercise, you will work once more with the reviews dataset of Amazon product reviews. A vector of labels y contains the sentiment : 1 if positive and 0 otherwise. The matrix X contains all numeric features created using a BOW approach.

You will need to train two logistic regression models with different levels of regularization and compare how they perform on the test data. Remember that regularization is a way to control the complexity of the model. The more regularized a model is, the less flexible it is but the better it can generalize. Models with higher level of regularization are often less accurate than non-regularized ones.

This exercise is part of the course

Sentiment Analysis in Python

View Course

Exercise instructions

  • Split the data into a train and test sets.
  • Train a logistic regression with regularization parameter of 1000. Train a second logistic regression with regularization parameter equal to 0.001.
  • Print the accuracy scores of both models on the test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split data into training and testing
____, ____, ____, ____ = train_test_split(____, ____, test_size=0.2, random_state=123)

# Train a logistic regression with regularization of 1000
log_reg1 = ____(____=1000).fit(X_train, y_train)
# Train a logistic regression with regularization of 0.001
log_reg2 = ____(____=0.001).fit(X_train, y_train)

# Print the accuracies
print('Accuracy of model 1: ', log_reg1.____(____, ____))
print('Accuracy of model 2: ', log_reg2.____(____, ____))
Edit and Run Code