Descriptive statistics

1. Descriptive statistics

So now our dataset is ready to develop a predictive algorithm. But before then, let's first get some quick descriptive insights.

2. Turnover rate

The variable that is providing information whether an employee has left the company or not is the column **churn**. Basically, if the value of this column is 1 then an employee has churned, and if it is 0 then we have not observed turnover in this case. To calculate the turnover rate we have to count number of times this variable has the value 1 and 0 and then divide it by the total. If we multiply the result by 100 then the outcome will be the % of employees who left and stayed. This task is again accomplished in 3 steps: - First we get the number of all the employees, which is basically the length of our data, - Then, we count 1s and 0s in the column churn, - Finally, we divide the counted values by the number of employees and multiple by 100 to get percentages. As you can see around 76% of our employees stayed, while 24% have churned. Thus, we conclude that turnover rate is 24%.

3. Correlations

Next, we are interested to learn what are the variables that are in a positive or negative linear relationship with our target. To see that, we will first of all develop the correlation matrix using the `corr()` method provided by **pandas** and then visualize the matrix using the `heatmap()` function by seaborn, a statistical visualization library. As you can see the target varaible **churn** has the highest negative correlation with satisfaction level. This shows that the increase in satisfaction level is associated with decrease in probability of turnover.

4. Let's practice!

Now it's your turn to practice.