Model selection
Both regularization and cross validation are powerful tools in model selection. Regularization can help prevent overfitting and cross validation ensures that your models are being evaluated properly. In this exercise, you will use regularization and cross validation together and see whether or not models differ significantly. You will calculate the precision only, although the same exercise can easily be done for recall and other evaluation metrics as well.
X_train
, y_train
, X_test
, y_test
are available in your workspace. pandas
as pd
, numpy
as np
, and sklearn
are also available in your workspace. Both precision_score()
and recall_score()
from sklearn.metrics
are available as well as KFold()
and cross_val_score()
from sklearn.model_selection
.
This exercise is part of the course
Predicting CTR with Machine Learning in Python
Exercise instructions
- Set up a K-Fold cross validation with four splits using
n_splits
and assign it tok-fold
. - Create a decision tree classifier.
- Use
k_fold
to run cross validation and evaluate the precision and recall of your decision tree model for the givenmax_depth
value.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Iterate over different levels of max depth and set up k-fold
for max_depth_val in [3, 5, 10]:
k_fold = ____(____ = 4, random_state = 0, shuffle = True)
clf = ____(____ = max_depth_val)
print("Evaluating Decision Tree for max_depth = %s" %(max_depth_val))
y_pred = clf.fit(____, ____).predict(____)
# Calculate precision for cross validation and test
cv_precision = ____(
____, X_train, y_train, cv = k_fold, scoring = 'precision_weighted')
precision = ____(y_test, y_pred, average = 'weighted')
print("Cross validation Precision: %s" %(cv_precision))
print("Test Precision: %s" %(precision))