Session Ready
Exercise

The sunflowerplot() function for repeated numerical data

A scatterplot represents each (x, y) pair in a dataset by a single point. If some of these pairs are repeated (i.e. if the same combination of x and y values appears more than once and thus lie on top of each other), we can't see this in a scatterplot. Several approaches have been developed to deal with this problem, including jittering, which adds small random values to each x and y value, so repeated points will appear as clusters of nearby points.

A useful alternative that is equally effective in representing repeated data points is the sunflowerplot, which represents each repeated point by a "sunflower," with one "petal" for each repetition of a data point.

This exercise asks you to construct both a scatterplot and a sunflowerplot from the same dataset, one that contains repeated data points. Comparing these plots allows you to see how much information can be lost in a standard scatterplot when some data points appear many times.

Instructions
100 XP
  • Use the par() function to set the mfrow parameter for a side-by-side plot array.
  • For the left-hand plot, use the plot() function to construct a scatterplot of the rad variable versus the zn variable, both from the Boston data frame in the MASS package.
  • Use the title() function to add the title "Standard scatterplot" to this plot.
  • For the right-hand plot, apply the sunflowerplot() function to the same data to see the presence of repeated data points, not evident from the scatterplot on the left.
  • Use the title() function to add the title "Sunflower plot".