Cross-validation for R-squared
Cross-validation is a vital approach to evaluating a model. It maximizes the amount of data that is available to the model, as the model is not only trained but also tested on all of the available data.
In this exercise, you will build a linear regression model, then use 6-fold cross-validation to assess its accuracy for predicting sales using social media advertising expenditure. You will display the individual score for each of the six-folds.
The sales_df
dataset has been split into y
for the target variable, and X
for the features, and preloaded for you. LinearRegression
has been imported from sklearn.linear_model
.
This exercise is part of the course
Supervised Learning with scikit-learn
Exercise instructions
- Import
KFold
andcross_val_score
. - Create
kf
by callingKFold()
, setting the number of splits to six,shuffle
toTrue
, and setting a seed of5
. - Perform cross-validation using
reg
onX
andy
, passingkf
tocv
. - Print the
cv_scores
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the necessary modules
from ____.____ import ____, ____
# Create a KFold object
kf = ____(n_splits=____, shuffle=____, random_state=____)
reg = LinearRegression()
# Compute 6-fold cross-validation scores
cv_scores = ____(____, ____, ____, cv=____)
# Print scores
print(____)