Using simputation to impute data
There are many imputation packages in R. We are going to focus on using the simputation package, which provides a simple, powerful interface into performing imputations.
Building a good imputation model is super important, but it is a complex topic - there is as much to building a good imputation model as there is for building a good statistical model. In this course, we are going to focus on how to evaluate imputations.
First, we are going to look at using impute_lm() function, which imputes values according to a specified linear model.
In this exercise, we are going to apply the previous assessment techniques to data with impute_lm(), and then build upon this imputation method in subsequent lessons.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
Using the oceanbuoys dataset:
- Impute
humidityusingwind_ewandwind_ns, and track missing values usingadd_label_shadow(). - Plot the imputed values for
air_temp_candhumidity, putting them on the x and y-axis, respectively, and coloring byany_missing().
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Impute humidity and air temperature using wind_ew and wind_ns, and track missing values
ocean_imp_lm_wind <- ___ %>%
bind_shadow() %>%
impute_lm(air_temp_c ~ wind_ew + wind_ns) %>%
impute_lm(___ ~ ___ + ___) %>%
add_label_shadow()
# Plot the imputed values for air_temp_c and humidity, colored by missingness
ggplot(___,
aes(x = ___, y = ___, color = any_missing)) +
geom_point()