LoslegenKostenlos loslegen

Logistic regression and feature selection

In this exercise we'll perform feature selection on the movie review sentiment data set using L1 regularization. The features and targets are already loaded for you in X_train and y_train.

We'll search for the best value of C using scikit-learn's GridSearchCV(), which was covered in the prerequisite course.

Diese Übung ist Teil des Kurses

Linear Classifiers in Python

Kurs anzeigen

Anleitung zur Übung

  • Instantiate a logistic regression object that uses L1 regularization.
  • Find the value of C that minimizes cross-validation error.
  • Print out the number of selected features for this value of C.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Specify L1 regularization
lr = LogisticRegression(solver='liblinear', ____)

# Instantiate the GridSearchCV object and run the search
searcher = GridSearchCV(lr, {'C':[0.001, 0.01, 0.1, 1, 10]})
searcher.fit(X_train, y_train)

# Report the best parameters
print("Best CV params", searcher.best_params_)

# Find the number of nonzero coefficients (selected features)
best_lr = searcher.best_estimator_
coefs = best_lr.____
print("Total number of features:", coefs.size)
print("Number of selected features:", np.count_nonzero(coefs))
Code bearbeiten und ausführen