Get startedGet started for free

Exploring the data

1. Know more about turnover

Data exploration is the first step in understanding your data.

2. Understanding the data

Here is a quick view of the dataset you imported. emp_id is the unique identifier for each employee. status shows employment status of an employee. If an employee is currently working in the organization, they are marked as active while if an employee has left the organization, their status is marked as inactive. The same also reflects in the turnover variable where 0 stands for active and 1 for inactive. For inactive employees, the last working date in the organization is stored in last_working_date column. For active employees, you will be using cutoff_date which is the study period end date. Let's go ahead and calculate turnover rate to derive insights from data.

3. Calculating turnover rate

Turnover rate is the percentage of employees who left the organization in a given period of time. To calculate turnover rate you need two numbers: the number of employees who left the organization during that period, i.e, count of all 1's and total number of employees in the organization during that period, i.e., sum of count of all 1's and 0's . In other words, turnover rate is the mean of the turnover variable in your dataset.

4. Count Active and Inactive employees

First, let's look at the number of active and inactive employees using the count() function from dplyr. count() gives you the number of rows of each unique value in a specified column.

5. Calculate turnover rate

You can calculate the turnover rate using the summarize() function. As mentioned before, you can take the mean of the turnover column to accomplish this. Here you can see that approx 18% of employees are inactive which means 82% of employees are active in the dataset.

6. Calculate turnover rate at each level

Turnover adversely affects efficiency, productivity, profitability and morale of the organization. To retain the talent it becomes imperative to find out where we are losing the most talent. Employee turnover rate can vary across job levels, hence to calculate the level-wise turnover rate, you can use the group_by() function.

7. Visualize the turnover trends using ggplot

Visualizing your data generally helps when you are comparing several values. You can plot a bar graph of the level wise turnover rate using the geom_col() layer and by placing level on the x-axis, turnover_level on the y-axis. As you can see here, the turnover rate is highest at the Analyst and Specialist levels.

8. Let's practice!

Now it's your turn to explore this data!