Identify optimal tree depth
Now you will tune the max_depth parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for loop through multiple max_depth parameter values and fit a decision tree for each, and then calculate performance metrics.
The list called depth_list with the parameter candidates has been loaded for you. The depth_tuning array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X, train_Y for the training data, and test_X, test_Y for the test data. Both numpy and pandas libraries are loaded as np and pd respectively.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Run a
forloop over the range from 0 to the length of the listdepth_list. - For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
- For each depth candidate, calculate the recall score by using the
recall_score()function and store it in the second column ofdepth_tunning. - Create a
pandasDataFrame out ofdepth_tuningwith the appropriate column names.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
# Initialize and fit decision tree with the `max_depth` candidate
mytree = DecisionTreeClassifier(___=depth_list[index])
mytree.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = mytree.predict(___)
# Calculate the recall score
depth_tuning[index,1] = ___(test_Y, ___)
# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))