Identify optimal tree depth
Now you will tune the max_depth
parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for
loop through multiple max_depth
parameter values and fit a decision tree for each, and then calculate performance metrics.
The list called depth_list
with the parameter candidates has been loaded for you. The depth_tuning
array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X
, train_Y
for the training data, and test_X
, test_Y
for the test data. Both numpy
and pandas
libraries are loaded as np
and pd
respectively.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Run a
for
loop over the range from 0 to the length of the listdepth_list
. - For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
- For each depth candidate, calculate the recall score by using the
recall_score()
function and store it in the second column ofdepth_tunning
. - Create a
pandas
DataFrame out ofdepth_tuning
with the appropriate column names.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
# Initialize and fit decision tree with the `max_depth` candidate
mytree = DecisionTreeClassifier(___=depth_list[index])
mytree.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = mytree.predict(___)
# Calculate the recall score
depth_tuning[index,1] = ___(test_Y, ___)
# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))