Using stat_sum
In the Vocab
dataset, education
and vocabulary
are integer variables. In the first course, you saw that this is one of the four causes of overplotting. You'd get a single point at each intersection between the two variables.
One solution, shown in the step 1, is jittering with transparency. Another solution is to use stat_sum()
, which calculates the total number of overlapping observations and maps that onto the size
aesthetic.
stat_sum()
allows a special variable, ..prop..
, to show the proportion of values within the dataset.
This exercise is part of the course
Intermediate Data Visualization with ggplot2
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Run this, look at the plot, then update it
ggplot(Vocab, aes(x = education, y = vocabulary)) +
# Replace this with a sum stat
geom_jitter(alpha = 0.25)