Visualizing Maryland crime data
Before fitting a model, plotting the data can be helpful to see if trends or data points jump out, outliers exist, or other attributes of the data require future consideration.
Using ggplot2
, you can plot lines for county and examine how crimes change through time.
For this exercise, examine Maryland crime data (md_crime
). This includes the Year
, a count of violent Crime
s in the county, and the County
's name.
To explore this data, first plot the data points for each county through time. This lets you see how each county changes through time. Rather than using an aesthetic such as color
, group
is used here because there are too many counties to easily distinguish colors. After plotting the raw data, add trend lines for each county.
Both the connect points (geom_line
) and trend lines (geom_smooth
) provide insight into what, if any, kinds of random effects are required. If all of the points appear to have similar ranges and means, a random-effect intercept may not be important. Likewise, if trends look consistent across counties (i.e., the trend lines look similar or parallel across groups), a random-effect slope may not be required.
This exercise is part of the course
Hierarchical and Mixed Effects Models in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot the change in crime through time by County
plot1 <-
ggplot(data = md_crime,
aes(x = ___, y = ___, group = ___)) +
geom_line() +
theme_minimal() +
ylab("Major crimes reported per county")
print(plot1)
# Add the trend line for each county
plot1 + ___