Evaluating imputations (many models & variables)

When you build up an imputation model, it's a good idea to compare it to another method.

In this lesson, we are going to get you to add a final imputation model that contains an extra useful piece of information that helps explain some of the variation in the data. You are then going to compare the values, as previously done in the last lesson.

Cet exercice fait partie du cours

Dealing With Missing Data in R

Afficher le cours

Instructions

Using the oceanbuoys dataset:

Impute data using impute_lm(), adding year to the model.
Bind the imputation methods together, placing ocean_imp_mean into mean, ocean_imp_lm_wind into lm_wind, and ocean_imp_lm_wind_year into lm_wind_year.
Look at the values of air_temp_c (on the x-axis) and humidity (on the y-axis), coloring by any missings, and faceting by imputation model.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Build a model adding year to the outcome
ocean_imp_lm_wind_year <- bind_shadow(___) %>%
  impute_lm(air_temp_c ~ wind_ew + wind_ns + ___) %>%
  impute_lm(humidity ~ wind_ew + wind_ns + ___) %>%
  add_label_shadow()

# Bind the mean, lm_wind, and lm_wind_year models together
bound_models <- bind_rows(mean = ocean_imp_mean,
                          lm_wind = ocean_imp_lm_wind,
                          lm_wind_year = ___,
                          .id = "imp_model")

# Explore air_temp and humidity, coloring by any missings, and faceting by imputation model
ggplot(___, aes(x = ___, y = ___, color = any_missing)) + 
  geom_point() + facet_wrap(~___)

Modifier et exécuter le code

Cet exercice fait partie du cours

Dealing With Missing Data in R

DébutantNiveau de compétence

4.8+

Commencer le cours gratuitement

Chapter 1 introduces you to missing data, explaining what missing values are, their behavior in R, how to detect them, and how to count them. We then introduce missing data summaries and how to summarise missingness across cases, variables, and how to explore across groups within the data. Finally, we discuss missing data visualizations, how to produce overview visualizations for the entire dataset and over variables, cases, and other summaries, and how to explore these across groups.

Exercise 1: Introduction to missing data Exercise 2: Using and finding missing values Exercise 3: How many missing values are there?Exercise 4: Working with missing values Exercise 5: Why care about missing values?Exercise 6: Summarizing missingness Exercise 7: Tabulating Missingness Exercise 8: Other summaries of missingness Exercise 9: How do we visualize missing values?Exercise 10: Your first missing data visualizations Exercise 11: Visualizing missing cases and variables Exercise 12: Visualizing missingness patterns

In chapter two, you will learn how to uncover hidden missing values like "missing" or "N/A" and replace them with `NA`. You will learn how to efficiently handle implicit missing values - those values implied to be missing, but not explicitly listed. We also cover how to explore missing data dependence, discussing Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.

Exercise 1: Searching for and replacing missing values Exercise 2: Using miss_scan_count Exercise 3: Using replace_with_na Exercise 4: Using replace_with_na scoped variants Exercise 5: Filling down missing values Exercise 6: Fix implicit missings using complete()Exercise 7: Fix explicit missings using fill()Exercise 8: Using complete() and fill() together Exercise 9: Missing Data dependence Exercise 10: Differences between MCAR and MAR Exercise 11: Exploring missingness dependence Exercise 12: Further exploring missingness dependence

In this chapter, you will learn about workflows for working with missing data. We introduce special data structures, the shadow matrix, and nabular data, and demonstrate how to use them in workflows for exploring missing data so that you can link summaries of missingness back to values in the data. You will learn how to use ggplot to explore and visualize how values changes as other variables go missing. Finally, you learn how to visualize missingness across two variables, and how and why to visualize missings in a scatterplot.

Exercise 1: Tools to explore missing data dependence Exercise 2: Creating shadow matrix data Exercise 3: Performing grouped summaries of missingness Exercise 4: Further exploring more combinations of missingness Exercise 5: Visualizing missingness across one variable Exercise 6: Nabular data and filling by missingness Exercise 7: Nabular data and summarising by missingness Exercise 8: Explore variation by missingness: box plots Exercise 9: Visualizing missingness across two variables Exercise 10: Exploring missing data with scatter plots Exercise 11: Using facets to explore missingness Exercise 12: Faceting to explore missingness (multiple plots)

In this chapter, you will learn about filling in the missing values in your data, which is called imputation. You will learn how to impute and track missing values, and what the good and bad features of imputations are so that you can explore, visualise, and evaluate the imputed data against the original values. You will learn how to use, evaluate, and compare different imputation models, and explore how different imputation models affect the inferences you can draw from the models.

Exercise 1: Filling in the blanks Exercise 2: Impute data below range with nabular data Exercise 3: Visualize imputed values in a scatter plot Exercise 4: Create histogram of imputed data Exercise 5: What makes a good imputation Exercise 6: Evaluating bad imputations Exercise 7: Evaluating imputations: The scale Exercise 8: Evaluating imputations: Across many variables Exercise 9: Performing imputations Exercise 10: Using simputation to impute data Exercise 11: Evaluating and comparing imputations Exercise 12: Evaluating imputations (many models & variables)

Exercice en cours

Exercise 13: Evaluating imputations and models Exercise 14: Combining and comparing many imputation models Exercise 15: Evaluating the different parameters in the model Exercise 16: Final Lesson