Using simputation to impute data
There are many imputation packages in R. We are going to focus on using the simputation
package, which provides a simple, powerful interface into performing imputations.
Building a good imputation model is super important, but it is a complex topic - there is as much to building a good imputation model as there is for building a good statistical model. In this course, we are going to focus on how to evaluate imputations.
First, we are going to look at using impute_lm()
function, which imputes values according to a specified linear model.
In this exercise, we are going to apply the previous assessment techniques to data with impute_lm()
, and then build upon this imputation method in subsequent lessons.
Cet exercice fait partie du cours
Dealing With Missing Data in R
Instructions
Using the oceanbuoys
dataset:
- Impute
humidity
usingwind_ew
andwind_ns
, and track missing values usingadd_label_shadow()
. - Plot the imputed values for
air_temp_c
andhumidity
, putting them on the x and y-axis, respectively, and coloring byany_missing()
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Impute humidity and air temperature using wind_ew and wind_ns, and track missing values
ocean_imp_lm_wind <- ___ %>%
bind_shadow() %>%
impute_lm(air_temp_c ~ wind_ew + wind_ns) %>%
impute_lm(___ ~ ___ + ___) %>%
add_label_shadow()
# Plot the imputed values for air_temp_c and humidity, colored by missingness
ggplot(___,
aes(x = ___, y = ___, color = any_missing)) +
geom_point()