Combining and comparing many imputation models
To evaluate the different imputation methods, we need to put them into a single dataframe. Next, you will compare three different approaches to handling missing data using the dataset, oceanbuoys
.
- The first method is using only the completed cases and is loaded as
ocean_cc
. - The second method is imputing values using a linear model with predictions made using wind and is loaded as
ocean_imp_lm_wind
.
You will create the third imputed dataset, ocean_imp_lm_all
, using a linear model and impute the variables sea_temp_c
, air_temp_c
, and humidity
using the variables wind_ew
, wind_ns
, year
, latitude
, longitude
.
You will then bind all of the datasets together (ocean_cc
, ocean_imp_lm_wind
, and ocean_imp_lm_all
), calling it bound_models
.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
- Create an imputed dataset named
ocean_imp_lm_all
using a linear model and impute the variablessea_temp_c
,air_temp_c
, andhumidity
using the variableswind_ew
,wind_ns
,year
,latitude
,longitude
. - Bind all of the datasets together into the same object, calling it
bound_models
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create an imputed dataset using a linear models
ocean_imp_lm_all <- bind_shadow(oceanbuoys) %>%
add_label_shadow() %>%
impute_lm(sea_temp_c ~ wind_ew + wind_ns + ___ + ___ + ___) %>%
impute_lm(air_temp_c ~ wind_ew + wind_ns + ___ + ___ + ___) %>%
impute_lm(humidity ~ wind_ew + wind_ns + ___ + ___ + ___)
# Bind the datasets
bound_models <- bind_rows(cc = ___,
imp_lm_wind = ___,
imp_lm_all = ___,
.id = "imp_model")
# Look at the models
bound_models