Identify optimal tree depth
Now you will tune the max_depth
parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for
loop through multiple max_depth
parameter values and fit a decision tree for each, and then calculate performance metrics.
The list called depth_list
with the parameter candidates has been loaded for you. The depth_tuning
array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X
, train_Y
for the training data, and test_X
, test_Y
for the test data. Both numpy
and pandas
libraries are loaded as np
and pd
respectively.
Cet exercice fait partie du cours
Machine Learning for Marketing in Python
Instructions
- Run a
for
loop over the range from 0 to the length of the listdepth_list
. - For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
- For each depth candidate, calculate the recall score by using the
recall_score()
function and store it in the second column ofdepth_tunning
. - Create a
pandas
DataFrame out ofdepth_tuning
with the appropriate column names.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
# Initialize and fit decision tree with the `max_depth` candidate
mytree = DecisionTreeClassifier(___=depth_list[index])
mytree.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = mytree.predict(___)
# Calculate the recall score
depth_tuning[index,1] = ___(test_Y, ___)
# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))