Centering and scaling for regression
Now you have seen the benefits of scaling your data, you will use a pipeline to preprocess the music_df
features and build a lasso regression model to predict a song's loudness.
X_train
, X_test
, y_train
, and y_test
have been created from the music_df
dataset, where the target is "loudness"
and the features are all other columns in the dataset. Lasso
and Pipeline
have also been imported for you.
Note that "genre"
has been converted to a binary feature where 1
indicates a rock song, and 0
represents other genres.
This exercise is part of the course
Supervised Learning with scikit-learn
Exercise instructions
- Import
StandardScaler
. - Create the steps for the pipeline object, a
StandardScaler
object called"scaler"
, and a lasso model called"lasso"
withalpha
set to0.5
. - Instantiate a pipeline with steps to scale and build a lasso regression model.
- Calculate the R-squared value on the test data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import StandardScaler
____
# Create pipeline steps
steps = [("____", ____()),
("____", ____(alpha=____))]
# Instantiate the pipeline
pipeline = ____(____)
pipeline.fit(X_train, y_train)
# Calculate and print R-squared
print(____.____(____, ____))