Get startedGet started for free

Employee safety

1. Employee safety

Welcome to the last chapter of the course. You've learned how to look for and find differences between groups of employees, and how to check for omitted variables that could explain those differences. In this chapter, you'll use all you've learned in a final case study. In this case study, a senior executive believes that workplace accidents have increased this past year at the production sites, where accidents are more likely to happen. She wants you to find out if that's true, and if it is, to look into what might be driving the increase.

2. Employee safety

Workplace safety is important from the employee perspective, for obvious reasons. It's important from the HR perspective because HR is an employee advocate, and because accidents can lead to increased employee turnover if they leave to work somewhere safer. From a business perspective, accidents can increase the cost of worker's compensation, and can potentially bring legal expenses if the employer is to blame.

3. Joining with two keys

To tackle this analysis, you'll need to join the accidents dataset with your base HR dataset. This dataset spans both 2016 and 2017, so if you join on employee id, left_join won't know which row in hr_data to join the accident data to. However, if you join on both employee id and year, the combination is enough to identify rows uniquely. You can join on multiple columns, or keys, by passing them to the "by" argument of left_join().

4. Dealing with NA

Since not all employees had an accident each year, there will be missing values in the resulting data frame. Filtering is a little different when missing values are involved. If you try to look at all the rows where accident_time is missing with this code, you will get no results. Instead, you should use the is.na() function, which will give you what you are looking for.

5. Why use is.na()?

Why won't the usual double-equals syntax work? Have a look for yourself. The double-equals gives what we expect for numbers and text. However, when you compare NA to NA, you don't get TRUE, you get another NA. This is because R doesn't know what NA is; it's a missing value. It doesn't know if those missing values are equal, so the result is another NA. Instead, to check if the value is NA, use is.na() to get the TRUE result you expect.

6. Let's practice!

Time to begin the final case study!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.