Get startedGet started for free

Model selection

Both regularization and cross validation are powerful tools in model selection. Regularization can help prevent overfitting and cross validation ensures that your models are being evaluated properly. In this exercise, you will use regularization and cross validation together and see whether or not models differ significantly. You will calculate the precision only, although the same exercise can easily be done for recall and other evaluation metrics as well.

X_train, y_train, X_test, y_test are available in your workspace. pandas as pd, numpy as np, and sklearn are also available in your workspace. Both precision_score() and recall_score() from sklearn.metrics are available as well as KFold() and cross_val_score() from sklearn.model_selection.

This exercise is part of the course

Predicting CTR with Machine Learning in Python

View Course

Exercise instructions

  • Set up a K-Fold cross validation with four splits using n_splits and assign it to k-fold.
  • Create a decision tree classifier.
  • Use k_fold to run cross validation and evaluate the precision and recall of your decision tree model for the given max_depth value.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Iterate over different levels of max depth and set up k-fold
for max_depth_val in [3, 5, 10]:
  k_fold = ____(____ = 4, random_state = 0, shuffle = True)
  clf = ____(____ = max_depth_val)
  print("Evaluating Decision Tree for max_depth = %s" %(max_depth_val))
  y_pred = clf.fit(____, ____).predict(____) 
  
  # Calculate precision for cross validation and test
  cv_precision = ____(
    ____, X_train, y_train, cv = k_fold, scoring = 'precision_weighted')
  precision = ____(y_test, y_pred, average = 'weighted')
  print("Cross validation Precision: %s" %(cv_precision))
  print("Test Precision: %s" %(precision))
Edit and Run Code