Identify optimal tree depth
Now you will tune the max_depth parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for loop through multiple max_depth parameter values and fit a decision tree for each, and then calculate performance metrics.
The list called depth_list with the parameter candidates has been loaded for you. The depth_tuning array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X, train_Y for the training data, and test_X, test_Y for the test data. Both numpy and pandas libraries are loaded as np and pd respectively.
Cet exercice fait partie du cours
Machine Learning for Marketing in Python
Instructions
- Run a
forloop over the range from 0 to the length of the listdepth_list. - For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
- For each depth candidate, calculate the recall score by using the
recall_score()function and store it in the second column ofdepth_tunning. - Create a
pandasDataFrame out ofdepth_tuningwith the appropriate column names.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
# Initialize and fit decision tree with the `max_depth` candidate
mytree = DecisionTreeClassifier(___=depth_list[index])
mytree.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = mytree.predict(___)
# Calculate the recall score
depth_tuning[index,1] = ___(test_Y, ___)
# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))