Cross validation
Cross validation is a technique that attempts to check on a model's holdout performance. It is done to ensure that the testing performance was not due to any particular issues on splitting of data. In this exercise, you will use implementations from sklearn
to run a K-fold cross validation by using the KFold()
module to assess cross validation to assess precision and recall for a decision tree.
X_train
, y_train
, X_test
, y_test
are available in your workspace. pandas
as pd
, numpy
as np
, and sklearn
are also available in your workspace. KFold()
and cross_val_score()
from sklearn.model_selection
are both available as well.
This exercise is part of the course
Predicting CTR with Machine Learning in Python
Exercise instructions
- Create a decision tree classifier.
- Set up a K-Fold cross validation with four splits and assign it to
k-fold
. - Use
k_fold
to run cross validation usingcross_val_score()
to evaluate the precision and recall of your model (and not usingrecall_score()
orprecision_score()
!).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create model
clf = ____
# Set up k-fold
k_fold = ____(n_splits = 4, random_state = 0, shuffle = True)
# Evaluate precision and recall for each fold
precision = ____(
clf, X_train, ____, cv = ____, scoring = 'precision_weighted')
recall = ____(
clf, X_train, ____, cv = ____, scoring = 'recall_weighted')
print("Precision scores: %s" %(precision))
print("Recall scores: %s" %(recall))