Get startedGet started for free

Using stat_sum

In the Vocab dataset, education and vocabulary are integer variables. In the first course, you saw that this is one of the four causes of overplotting. You'd get a single point at each intersection between the two variables.

One solution, shown in the step 1, is jittering with transparency. Another solution is to use stat_sum(), which calculates the total number of overlapping observations and maps that onto the size aesthetic.

stat_sum() allows a special variable, ..prop.., to show the proportion of values within the dataset.

This exercise is part of the course

Intermediate Data Visualization with ggplot2

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Run this, look at the plot, then update it
ggplot(Vocab, aes(x = education, y = vocabulary)) +
  # Replace this with a sum stat
  geom_jitter(alpha = 0.25)
Edit and Run Code