Overplotting 1: large datasets
Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:
- Large datasets
- Aligned values on a single axis
- Low-precision data
- Integer data
Typically, alpha blending (i.e. adding transparency) is recommended when using solid shapes. Alternatively, you can use opaque, hollow shapes.
Small points are suitable for large datasets with regions of high density (lots of overlapping).
Let's use the diamonds dataset to practice dealing with the large dataset case.
Deze oefening maakt deel uit van de cursus
Introduction to Data Visualization with ggplot2
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Plot price vs. carat, colored by clarity
plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))
# Add a point layer with tiny points
plt_price_vs_carat_by_clarity + ___