Regression with categorical features
Now you have created music_dummies
, containing binary features for each song's genre, it's time to build a ridge regression model to predict song popularity.
music_dummies
has been preloaded for you, along with Ridge
, cross_val_score
, numpy
as np
, and a KFold
object stored as kf
.
The model will be evaluated by calculating the average RMSE, but first, you will need to convert the scores for each fold to positive values and take their square root. This metric shows the average error of our model's predictions, so it can be compared against the standard deviation of the target value—"popularity"
.
This exercise is part of the course
Supervised Learning with scikit-learn
Exercise instructions
- Create
X
, containing all features inmusic_dummies
, andy
, consisting of the"popularity"
column, respectively. - Instantiate a ridge regression model, setting
alpha
equal to 0.2. - Perform cross-validation on
X
andy
using the ridge model, settingcv
equal tokf
, and using negative mean squared error as the scoring metric. - Print the RMSE values by converting negative
scores
to positive and taking the square root.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create X and y
X = ____
y = ____
# Instantiate a ridge model
ridge = ____
# Perform cross-validation
scores = ____(____, ____, ____, cv=____, scoring="____")
# Calculate RMSE
rmse = np.____(____)
print("Average RMSE: {}".format(np.mean(rmse)))
print("Standard Deviation of the target array: {}".format(np.std(y)))