Session Ready
Exercise

Overplotting 1: large datasets

Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:

  1. Large datasets
  2. Aligned values on a single axis
  3. Low-precision data
  4. Integer data

Typically, alpha blending (i.e. adding transparency) is recommended when using solid shapes. Alternatively, you can use opaque, hollow shapes.

Small points are suitable for large datasets with regions of high density (lots of overlapping).

Let's use the diamonds dataset to practice dealing with the large dataset case.

Instructions 1/2
undefined XP
  • 1

    Add a points layer to the base plot.

    • Set the point transparency to 0.5.
    • Set shape = ".", the point size of 1 pixel.
    • 2

      Update the point shape to remove the line outlines by setting shape to 16.