Get startedGet started for free

Focusing on the location of interest

1. Focusing on the location of interest

The location with the highest accident rate in 2017 is worth looking at, but if the accident rate was also high in 2016, that doesn't help to explain the overall increase in accidents. It's better to focus on Southfield, the location where the rate increased. In other chapters, you've compared two groups to each other in a given timeframe, such as new hires and current employees. Rather than compare Southfield to other locations, this time you'll compare Southfield in 2017 to Southfield in 2016 to understand why the accident rate increased from one year to the next.

2. Next steps

In the following exercises, you'll check to see if Southfield had other variables that changed significantly between 2016 and 2017. You'll bring in additional data as part of your investigation. If you find changes, you'll check that other locations didn't have the same change. For example, if you found that Southfield had an increased number of new hires in 2017, you might conclude that the inexperienced workers were causing the increased number of accidents. But, if the other locations had a similar influx of inexperienced workers, and their accident rate didn't increase very much, you would need to doubt that conclusion.

3. Filtering out unwanted data

To test the other locations, you can use filter(). Instead of filtering to select a specific location, you'll be filtering to select data not from that location. You can do this with != in R. Here is the hr_joined dataset, with the counts of the 5 possible engagement scores. If we use filter() to remove any engagement scores that equal 5, and count again, there are no more instances of engagement scores that equal 5. You can do the same thing with locations, or any other variable.

4. Let's practice!

Now, back to the final case study to see what you uncover.