1. Visualizing missingness across two variables
Missing values are typically ignored in a scatter plot.
In this lesson we are going to talk about how to visualize missings in a scatter plot, and how and why that works.
2. The problem of visualizing missing data in two dimensions
The problem with visualizing a scatter plot when the data has missing values is that it removes any observations - entire rows - that have missing values.
ggplot is actually very nice here and gives a warning that missing values are being dropped. The same cannot be said of other all functions in R!
3. Introduction to geom_miss_point()
To explore the missings in a scatter plot, we can use geom_miss_point.
geom_miss_point visualizes the missing values by placing them in the margins.
On the left in red we can see the values of solar-dot-radiation when Ozone is missing.
This shows us that the values of solar radiation are reasonably uniform,
The values of Ozone when Solar-dot-R is missing are shown in red on the bottom, this shows us that the missing values tend to occur at lower values of Ozone.
In the bottom left we show cases where there are missings in both Ozone and Solar Radiation.
To explain how and why this visualization works, we are going to take a brief moment to unpack the data transformation that occurs here.
4. Aside: How geom_miss_point() works
geom_miss_point performs a transformation on the data and actually imputes - that is, fills in, the values that are missing.
Taking an example of just the Ozone data.
It does a special imputation, and imputes the data 10% below the minimum value - as shown here in the column ozone_shift, and then keeps track of where it is, to show it in the visualization - as shown here with Ozone_NA.
We'll come back to this idea of tracking missing values in the next chapter.
5. Exploring missingness using facets
Because geom_miss_point is a defined ggplot2 geometry, it behaves like any other ggplot.
This means, for example, that you can use ggplot features like facets, to further explore your missing data.
For example, you can facet by Month, to explore how the missingness changes over month.
6. Exploring missingness using facets
You can even use nabular data from the previous lesson, and explore the missingness by another variable being missing.
For example, you can explore how the missingness changes when Solar Radiation is missing.
7. Let's practice!
Now let's try some examples.