Get startedGet started for free

Using simputation to impute data

There are many imputation packages in R. We are going to focus on using the simputation package, which provides a simple, powerful interface into performing imputations.

Building a good imputation model is super important, but it is a complex topic - there is as much to building a good imputation model as there is for building a good statistical model. In this course, we are going to focus on how to evaluate imputations.

First, we are going to look at using impute_lm() function, which imputes values according to a specified linear model.

In this exercise, we are going to apply the previous assessment techniques to data with impute_lm(), and then build upon this imputation method in subsequent lessons.

This exercise is part of the course

Dealing With Missing Data in R

View Course

Exercise instructions

Using the oceanbuoys dataset:

  • Impute humidity using wind_ew and wind_ns, and track missing values using add_label_shadow().
  • Plot the imputed values for air_temp_c and humidity, putting them on the x and y-axis, respectively, and coloring by any_missing().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Impute humidity and air temperature using wind_ew and wind_ns, and track missing values
ocean_imp_lm_wind <- ___ %>% 
    bind_shadow() %>%
    impute_lm(air_temp_c ~ wind_ew + wind_ns) %>% 
    impute_lm(___ ~ ___ + ___) %>%
    add_label_shadow()
    
# Plot the imputed values for air_temp_c and humidity, colored by missingness
ggplot(___, 
       aes(x = ___, y = ___, color = any_missing)) + 
  geom_point()
Edit and Run Code