Identify optimal tree depth
Now you will tune the max_depth
parameter of the decision tree to discover the one which reduces over-fitting while still maintaining good model performance metrics. You will run a for
loop through multiple max_depth
parameter values and fit a decision tree for each, and then calculate performance metrics.
The list called depth_list
with the parameter candidates has been loaded for you. The depth_tuning
array has been built for you with 2 columns, with the first one being filled with the depth candidates, and the next one being a placeholder for the recall score. Also, the features and target variables have been loaded as train_X
, train_Y
for the training data, and test_X
, test_Y
for the test data. Both numpy
and pandas
libraries are loaded as np
and pd
respectively.
Este exercício faz parte do curso
Machine Learning for Marketing in Python
Instruções do exercício
- Run a
for
loop over the range from 0 to the length of the listdepth_list
. - For each depth candidate, initialize and fit a decision tree classifier and predict churn on test data.
- For each depth candidate, calculate the recall score by using the
recall_score()
function and store it in the second column ofdepth_tunning
. - Create a
pandas
DataFrame out ofdepth_tuning
with the appropriate column names.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Run a for loop over the range of depth list length
for index in ___(0, len(depth_list)):
# Initialize and fit decision tree with the `max_depth` candidate
mytree = DecisionTreeClassifier(___=depth_list[index])
mytree.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = mytree.predict(___)
# Calculate the recall score
depth_tuning[index,1] = ___(test_Y, ___)
# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','Recall']
print(pd.DataFrame(depth_tuning, columns=___))