Impute data below range with nabular data
We want to keep track of values we imputed. If we don't, it is very difficult to assess how good the imputed values are.
We are going to practice imputing data and recreate visualizations in the previous set of exercises by imputing values below the range of the data.
This is a very useful way to help further explore missingness, and also provides the framework for imputing missing values.
First, we are going to impute the data below the range using impute_below_all()
, and then visualize the data. We notice that although we can see where the missing values are in this instance, we need some way to track them. The track missing data programming pattern can help with this.
This exercise is part of the course
Dealing With Missing Data in R
Exercise instructions
Using the oceanbuoys
data:
- Impute below the range using
impute_below_all()
. - Visualize the new missing values for
wind_ew
on the x-axis andair_temp_c
on the y-axis. - Impute and track data with
bind_shadow()
,impute_below_all()
, andadd_label_shadow()
. - Show the plot and inspect the imputed values.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Impute the oceanbuoys data below the range using `impute_below`.
ocean_imp <- impute_below_all(___)
# Visualize the new missing values
ggplot(___,
aes(x = ___, y = ___)) +
geom_point()
# Impute and track data with `bind_shadow`, `impute_below_all`, and `add_label_shadow`
ocean_imp_track <- bind_shadow(___) %>%
___() %>%
___()
# Look at the imputed values
___