Aan de slagGa gratis aan de slag

Using simputation to impute data

There are many imputation packages in R. We are going to focus on using the simputation package, which provides a simple, powerful interface into performing imputations.

Building a good imputation model is super important, but it is a complex topic - there is as much to building a good imputation model as there is for building a good statistical model. In this course, we are going to focus on how to evaluate imputations.

First, we are going to look at using impute_lm() function, which imputes values according to a specified linear model.

In this exercise, we are going to apply the previous assessment techniques to data with impute_lm(), and then build upon this imputation method in subsequent lessons.

Deze oefening maakt deel uit van de cursus

Dealing With Missing Data in R

Cursus bekijken

Oefeninstructies

Using the oceanbuoys dataset:

  • Impute humidity using wind_ew and wind_ns, and track missing values using add_label_shadow().
  • Plot the imputed values for air_temp_c and humidity, putting them on the x and y-axis, respectively, and coloring by any_missing().

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Impute humidity and air temperature using wind_ew and wind_ns, and track missing values
ocean_imp_lm_wind <- ___ %>% 
    bind_shadow() %>%
    impute_lm(air_temp_c ~ wind_ew + wind_ns) %>% 
    impute_lm(___ ~ ___ + ___) %>%
    add_label_shadow()
    
# Plot the imputed values for air_temp_c and humidity, colored by missingness
ggplot(___, 
       aes(x = ___, y = ___, color = any_missing)) + 
  geom_point()
Code bewerken en uitvoeren