Exercise

Using stat_sum

In the Vocab dataset, education and vocabulary are integer variables. In the first course, you saw that this is one of the four causes of overplotting. You'd get a single point at each intersection between the two variables.

One solution, shown in the step 1, is jittering with transparency. Another solution is to use stat_sum(), which calculates the total number of overlapping observations and maps that onto the size aesthetic.

stat_sum() allows a special variable, ..prop.., to show the proportion of values within the dataset.

Instructions 1/4

undefined XP
  • 1
    • Run the code to see how jittering & transparency solves overplotting.
    • Replace the jittered points with a sum stat, using stat_sum().
  • 2

    Modify the size aesthetic with the appropriate scale function.

    • Add a scale_size() function to set the range from 1 to 10.
  • 3

    Inside stat_sum(), set size to ..prop.. so circle size represents the proportion of the whole dataset.

  • 4

    Update the plot to group by education, so that circle size represents the proportion of the group.