Centering and scaling for classification
Now you will bring together scaling and model building into a pipeline for cross-validation.
Your task is to build a pipeline to scale features in the music_df
dataset and perform grid search cross-validation using a logistic regression model with different values for the hyperparameter C
. The target variable here is "genre"
, which contains binary values for rock as 1
and any other genre as 0
.
StandardScaler
, LogisticRegression
, and GridSearchCV
have all been imported for you.
This exercise is part of the course
Supervised Learning with scikit-learn
Exercise instructions
- Build the steps for the pipeline: a
StandardScaler()
object named"scaler"
, and a logistic regression model named"logreg"
. - Create the
parameters
, searching 20 equally spaced float values ranging from0.001
to1.0
for the logistic regression model'sC
hyperparameter within the pipeline. - Instantiate the grid search object.
- Fit the grid search object to the training data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build the steps
steps = [("____", ____()),
("____", ____())]
pipeline = Pipeline(steps)
# Create the parameter space
parameters = {"____": np.____(____, ____, 20)}
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=21)
# Instantiate the grid search object
cv = ____(____, param_grid=____)
# Fit to the training data
cv.____(____, ____)
print(cv.best_score_, "\n", cv.best_params_)