Get startedGet started for free

Identify optimal tree depth

Now you will tune the max_depth parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for loop through multiple max_depth parameter values and fit a decision tree for each, and then calculate performance metrics.

The list called depth_list with the parameter candidates has been loaded for you. The depth_tuning array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X, train_Y for the training data, and test_X, test_Y for the test data. Both numpy and pandas libraries are loaded as np and pd respectively.

This exercise is part of the course

Machine Learning for Marketing in Python

View Course

Exercise instructions

  • Run a for loop over the range from 0 to the length of the list depth_list.
  • For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
  • For each depth candidate, calculate the recall score by using the recall_score() function and store it in the second column of depth_tunning.
  • Create a pandas DataFrame out of depth_tuning with the appropriate column names.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
  # Initialize and fit decision tree with the `max_depth` candidate
  mytree = DecisionTreeClassifier(___=depth_list[index])
  mytree.fit(___, train_Y)
  # Predict churn on the testing data
  pred_test_Y = mytree.predict(___)
  # Calculate the recall score 
  depth_tuning[index,1] = ___(test_Y, ___)

# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))
Edit and Run Code