Predicting employee churn using decision trees

1. Predicting employee churn using decision trees

As for now, we know how Decision tree works in theory. Let's apply this knowledge and use Python to predict employee churn.

2. Decision Tree in Python

To get the tree, we need to first import the necessary functions and initialize them. For that reason, we will again use already familiar `sklearn` library. Once imported, we need to initialize this long-named function with a more friendly name and also provided a parameter called random_state. This parameter does not really affect the model results, it just ensures that if you run it 2nd time you will still get the same results. As a consequence, it is not important whether it will be = 1, 20 or anything else, what is important is to give the same values if you need to reproduce same results. Once the model is set up, we can go on and use a fit() method on it to fit our features to the target. As you remember, we used train/test split to develop model on train component but then validate on test. This is done to avoid overfitting. For that reason, we use `features_train` and `target_train` components for fitting. Once we run this piece of code, the tree is already calculated and grown. To test out how good this tree is making its prediction we need to use a method called score(), which is calculating the accuracy score of the prediction. Again, because we developed the model based on the training component, we calculate accuracy score on the test component. The score will show how correct prediction is. For example, the score of 0.65 is showing that we made a correct prediction whether an employee will leave or stay based on our tree for 65% of cases. So to get percentages, we just need to multiply the accuracy score by 100.

3. Let's practice!

OK, Now it's your turn to calculate the accuracies.