Predict churn with decision tree
Now you will build on the skills you acquired in the earlier exercise, and build a more complex decision tree with additional parameters to predict customer churn. You will dive deep into the churn prediction problem in the next chapter. Here you will run the decision tree classifier again on your training data, predict the churn rate on unseen (test) data, and assess model accuracy on both datasets.
The tree
module from the sklearn
library has been loaded for you, as well as the accuracy_score
function from sklearn.metrics
. The features and target variables have also been imported as train_X
, train_Y
for training data, and test_X
, test_Y
for test data.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Initialize a Decision tree with maximum depth set to 7 and by using the gini criterion.
- Fit the model to the training data.
- Predict the values on the test dataset.
- Print the accuracy values for both training and test datasets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the Decision Tree
clf = tree.DecisionTreeClassifier(max_depth = ___,
criterion = 'gini',
splitter = 'best')
# Fit the model to the training data
clf = clf.___(train_X, train_Y)
# Predict the values on test dataset
pred_Y = clf.___(test_X)
# Print accuracy values
print("Training accuracy: ", np.round(clf.score(train_X, train_Y), 3))
print("Test accuracy: ", np.round(___(test_Y, pred_Y), 3))