Model selection
Both regularization and cross validation are powerful tools in model selection. Regularization can help prevent overfitting and cross validation ensures that your models are being evaluated properly. In this exercise, you will use regularization and cross validation together and see whether or not models differ significantly. You will calculate the precision only, although the same exercise can easily be done for recall and other evaluation metrics as well.
X_train, y_train, X_test, y_test are available in your workspace. pandas as pd, numpy as np, and sklearn are also available in your workspace. Both precision_score() and recall_score() from sklearn.metrics are available as well as KFold() and cross_val_score() from sklearn.model_selection.
Cet exercice fait partie du cours
Predicting CTR with Machine Learning in Python
Instructions
- Set up a K-Fold cross validation with four splits using
n_splitsand assign it tok-fold. - Create a decision tree classifier.
- Use
k_foldto run cross validation and evaluate the precision and recall of your decision tree model for the givenmax_depthvalue.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Iterate over different levels of max depth and set up k-fold
for max_depth_val in [3, 5, 10]:
k_fold = ____(____ = 4, random_state = 0, shuffle = True)
clf = ____(____ = max_depth_val)
print("Evaluating Decision Tree for max_depth = %s" %(max_depth_val))
y_pred = clf.fit(____, ____).predict(____)
# Calculate precision for cross validation and test
cv_precision = ____(
____, X_train, y_train, cv = k_fold, scoring = 'precision_weighted')
precision = ____(y_test, y_pred, average = 'weighted')
print("Cross validation Precision: %s" %(cv_precision))
print("Test Precision: %s" %(precision))