Get startedGet started for free

Visualizing missingness across one variable

1. Exploring conditional missings with ggplot

Now that we've explored some ways to summarize data using nabular data, we are going to explore how you can use nabular data to explore how variables vary as other variables go missing.

2. What we are going to cover

In this lesson we are going to cover how to explore missingness using ggplot2 visualizations. We will explore how to visualize densities, box plots, and some ways of creating multiple plots for each type of missingness.

3. Visualizing missings using densities

To begin, we can look at the distribution of Temperature using ggplot, placing Temp on the X axis, and then using geom_density to visualize temperature as a density, or a distribution.

4. Visualizing missings using densities

To explore how Temperature changes when Ozone is missing, we create the nabular data with bind_shadow, and then add in our aesthetics, color equals Ozone_NA. This now splits the density into two densities, one for Temperature when Ozone is present, and one for Temperature when Ozone is absent. This shows us that the values of Temperature don't change much when Ozone is present or absent.

5. Visualizing missings using box plots

Similarly, you can use box plots to explore missing data, by putting the missingness that you would like to explore by on the x axis (Ozone_NA), and Temperature on the y axis, then using geom_boxplot. Here we note that the values of temperature are similar when Ozone is missing versus not missing. However, there is generally less variation for temperature when Ozone is missing, but there are also some temperature outliers.

6. Visualizing missings using facets

We can visualize two densities for temperature according to the missingness of Ozone. This is similar to the previous density visualization, except that the densities are not overlaid, and are faceted - they are in separate plots. A similar visualization to the previous visualization of densities can be made using facets. Here, we use nabular data to create a density plot, using facet_wrap(~ Ozone_NA). Splitting by facet can be useful if you want to compare different types of visualization.

7. Visualizing missings using facets

You can look at two scatter plots, faceting by the missingness of Ozone using Ozone_NA, for the values Temperature and Wind. Here we note that there are fewer wind and temperature scores when Ozone is missing, and that these tend to occur for temperatures over 70 and wind speeds over 5. Overall, the values of wind and temperature when Ozone is missing seem similar to when Ozone is present.

8. Visualizing missings using color

Equivalently to the previous faceted plot, you can visualize the points according to whether they are missing. This overlays the points rather than creating separate plots. This can sometimes help make comparisons easier, although this is not always the case.

9. Adding layers of missingness

A useful advantage to using facet to split by missings is that this allows you to look at another condition of missingness. For example, create two plots by the missingness of Solar Radiation, and then color the densities by missingness of Ozone. This shows us that there isn't much difference in temperature when Solar Radiation isn't missing, but when Solar Radiation is missing, the temperatures are quite low!

10. Let's practice!

Now that we've covered some methods for visually exploring missing data using nabular data and ggplot2, it's time to practice using this on some other data!