Regularized logistic regression

In Chapter 1, you used logistic regression on the handwritten digits data set. Here, we'll explore the effect of L2 regularization.

The handwritten digits dataset is already loaded, split, and stored in the variables X_train, y_train, X_valid, and y_valid. The variables train_errs and valid_errs are already initialized as empty lists.

This exercise is part of the course

Linear Classifiers in Python

View Course

Exercise instructions

Loop over the different values of C_value, creating and fitting a LogisticRegression model each time.
Save the error on the training set and the validation set for each model.
Create a plot of the training and testing error as a function of the regularization parameter, C.
Looking at the plot, what's the best value of C?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Train and validaton errors initialized as empty list
train_errs = list()
valid_errs = list()

# Loop over values of C_value
for C_value in [0.001, 0.01, 0.1, 1, 10, 100, 1000]:
    # Create LogisticRegression object and fit
    lr = ____
    lr.fit(____)
    
    # Evaluate error rates and append to lists
    train_errs.append( 1.0 - lr.score(____) )
    valid_errs.append( 1.0 - lr.score(____) )
    
# Plot results
plt.semilogx(C_values, train_errs, C_values, valid_errs)
plt.legend(("train", "validation"))
plt.show()

Edit and Run Code