Using simputation to impute data
There are many imputation packages in R. We are going to focus on using the simputation
package, which provides a simple, powerful interface into performing imputations.
Building a good imputation model is super important, but it is a complex topic - there is as much to building a good imputation model as there is for building a good statistical model. In this course, we are going to focus on how to evaluate imputations.
First, we are going to look at using impute_lm()
function, which imputes values according to a specified linear model.
In this exercise, we are going to apply the previous assessment techniques to data with impute_lm()
, and then build upon this imputation method in subsequent lessons.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
Using the oceanbuoys
dataset:
- Impute
humidity
usingwind_ew
andwind_ns
, and track missing values usingadd_label_shadow()
. - Plot the imputed values for
air_temp_c
andhumidity
, putting them on the x and y-axis, respectively, and coloring byany_missing()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Impute humidity and air temperature using wind_ew and wind_ns, and track missing values
ocean_imp_lm_wind <- ___ %>%
bind_shadow() %>%
impute_lm(air_temp_c ~ wind_ew + wind_ns) %>%
impute_lm(___ ~ ___ + ___) %>%
add_label_shadow()
# Plot the imputed values for air_temp_c and humidity, colored by missingness
ggplot(___,
aes(x = ___, y = ___, color = any_missing)) +
geom_point()