Get startedGet started for free

Filling in the blanks

1. Performing and tracking imputation

Exploring missing data helps us understand the data and make sure that we are happy with it for analysis. Once we understand our data and the relationships amongst the variables and the missingness, it is a good idea to perform imputation, so that you can perform analysis.

2. Lesson overview

In this lesson, we are going to focus on using imputations to understand data structure, visualizing and exploring imputed values, and develop the following skills. Imputing data and tracking missing values, and visualizing imputed values against data Some of these techniques might look familiar, because we have covered some of these in previous lessons. This is one of the benefits to using naniar; the methods applied for exploring missing values are similar to exploring imputations.

3. Using imputations to understand data structure

In past lessons we have used geom_miss_point to explore missing values. This "shifted" the missing values below the range of the data so that we could see them. It also actually performed some imputations! We are going to recreate these visualizations using the impute_below function from naniar that imputes values below the range of the data. For example, for this vector of numbers 5:10 with one missing value, it imputes the value 4-point-4 into the missing value, since this is lower than the lowest value of the data at hand, namely 5-point-000.

4. impute_below

impute_below has some useful variations that give flexibility to apply it to some, or all variables of a data.frame. impute_below only imputes variables that satisfy a condition, like is this column numeric with is-dot-numeric? impute_below_at imputes variables specified inside vars. And impute_below_all will impute all variables.

5. Tracking missing values

We need to track the missing values, once we impute them. otherwise we don't know what was imputed and what was not. We can see that in our example, once we impute the data, we have no way to recognize which one it is.

6. Tracking missing values

We can identify missings by using bind_shadow to turn the data into nabular form. Now when we impute the data, we can see that the shadow variable, var1_NA reveals the imputed value, 4-point-40.

7. Visualize imputed values against data values using histograms

Using this imputed data, we can explore the number of missings in a single variable, along with its distribution, using a histogram and coloring the missings using fill equals Ozone_NA. Here we see that there are a few missing values - two bars around 20, so just under 40 missing values.

8. Visualize imputed values against data values using facets

We can take this same plot and visualize it across facets. For example, plot it by Month, which shows us that most missing values occur in month 6 - which didn't have many high values of Ozone.

9. Visualize imputed values using facets

We can split the plot according to the missingness of Solar Radiation by referring to it as Solar-dot-R_NA This shows us that there aren't many missing values in Ozone when Solar radiation is missing.

10. Visualize imputed values against data values using scatter plots

To visualize missings for two variables, we need to add a label that identifies whether there is a missing value in a column. The function add_label_shadow does this for us. We have now recreated the same figure as geom_miss_point!

11. Let's practice!

naniar has a workflow similar for exploring missing values and imputations, which helps make it easy to learn! Now let's practice!