Using stat_sum
In the Vocab
dataset, education
and vocabulary
are integer variables. In the first course, you saw that this is one of the four causes of overplotting. You'd get a single point at each intersection between the two variables.
One solution, shown in the step 1, is jittering with transparency. Another solution is to use stat_sum()
, which calculates the total number of overlapping observations and maps that onto the size
aesthetic.
stat_sum()
allows a special variable, ..prop..
, to show the proportion of values within the dataset.
Cet exercice fait partie du cours
Intermediate Data Visualization with ggplot2
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Run this, look at the plot, then update it
ggplot(Vocab, aes(x = education, y = vocabulary)) +
# Replace this with a sum stat
geom_jitter(alpha = 0.25)