Get startedGet started for free

Combining and comparing many imputation models

To evaluate the different imputation methods, we need to put them into a single dataframe. Next, you will compare three different approaches to handling missing data using the dataset, oceanbuoys.

  • The first method is using only the completed cases and is loaded as ocean_cc.
  • The second method is imputing values using a linear model with predictions made using wind and is loaded as ocean_imp_lm_wind.

You will create the third imputed dataset, ocean_imp_lm_all, using a linear model and impute the variables sea_temp_c, air_temp_c, and humidity using the variables wind_ew, wind_ns, year, latitude, longitude.

You will then bind all of the datasets together (ocean_cc, ocean_imp_lm_wind, and ocean_imp_lm_all), calling it bound_models.

This exercise is part of the course

Dealing With Missing Data in R

View Course

Exercise instructions

  • Create an imputed dataset named ocean_imp_lm_all using a linear model and impute the variables sea_temp_c, air_temp_c, and humidity using the variables wind_ew, wind_ns, year, latitude, longitude.
  • Bind all of the datasets together into the same object, calling it bound_models.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create an imputed dataset using a linear models
ocean_imp_lm_all <- bind_shadow(oceanbuoys) %>%
  add_label_shadow() %>%
  impute_lm(sea_temp_c ~ wind_ew + wind_ns + ___ + ___ + ___) %>%
  impute_lm(air_temp_c ~ wind_ew + wind_ns + ___ + ___ + ___) %>%
  impute_lm(humidity ~ wind_ew + wind_ns + ___ + ___ + ___)

# Bind the datasets
bound_models <- bind_rows(cc = ___,
                          imp_lm_wind = ___,
                          imp_lm_all = ___,
                          .id = "imp_model")
# Look at the models
bound_models
Edit and Run Code