Master data overview
So far you have combined information from rating
and survey
datasets with your original dataset.
We added several other employee-related information such as compensation
, no_leaves_taken
(number of vacation days taken), hiring_source
etc. in the dataset org_final
. Go ahead and check out this dataset before doing feature engineering in the next chapter.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in R
Exercise instructions
- Use
glimpse()
to view the structure of theorg_final
dataset. - Assign the number of variables in the
org_final
dataset tovariables
. - Generate a box plot to visualize the distribution of
distance_from_home
forActive
andInactive
employees.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# View the structure of the dataset
___
# Number of variables in the dataset
variables <- ___
# Compare the travel distance of Active and Inactive employees
ggplot(org_final, aes(x = ___, y = ___)) +
___