Get startedGet started for free

Scatter plots

1. Scatter plots

The third essential layer is the geometry layer. This determines how the plot actually looks. We've already seen many geometries in action - so let's take a closer look.

2. 48 geometries

At present there are almost 50 different geometries to choose from, although there are some redundancies. They can all be accessed using its own geom_ function. As the domain specialist, it's your job to choose the best geom, but there are some useful guidelines.

3. Common plot types

Let's begin with scatter plots.

4. Scatter plots

Each geom is associated with specific aesthetic mappings, some of which are essential. To use geom_point, we need the x and y aesthetics.

5. Scatter plots

In addition to the essential aesthetics, we can also choose optional aesthetics, like alpha, color, fill, shape, size or stroke. These are all also attribute settings, as we discussed earlier.

6. Geom-specific aesthetic mappings

We can specify both geom-specific data and aesthetics. This allows us to control the information for each layer independently.

7. iris demo

Imagine I have a data frame which contains summary statistics, such as the mean, for each of my variables. In this case it's the average sepal width and length for each of the three iris species. ggplot2 can actually take care of the statistics for us, we don't need to calculate it ourselves beforehand, but let's see how to use it if we have. To show all the individual points and have the mean of the x and y plotted on top, I could add another geom_point layer accessing this data set.

8. iris plot

In this plot one geom_point layer inherits the data and aesthetics from the parent ggplot function, and in the other I specify a different data set. Note that the aesthetics are inherited, as per the first geom function. I've changed the shape and the size attributes of the points so that they are distinguishable from the background points.

9. Shape attribute values

The possible values are shown here. 15 is a solid square. Numbers 21 - 25 are not simply repeats of earlier codes, these shapes have both fill and color, which can be controlled independently.

10. Example

For example, I can have a black fill and use a stroke of 2 for a thick outline. The color aesthetic is still inherited from the parental layer. Imagine I wanted to have crosshairs marking where each mean value appears on the plot.

11. On-the-fly stats by ggplot2

It's not fair to plot the mean without some measure of spread, like the standard deviation. We'll get into that in the next course when we discuss the stats layer.

12. position = "jitter"

Recall that in the last chapter we used the position argument to change the position from identity to jitter.

13. geom_jitter()

We could have also done this with the geom_jitter function directly. geom_jitter is just a wrapper for geom_points with position set to jitter.

14. Don't forget to adjust alpha

On top of jittering, we would also need to deal with overplotting of points by adjusting the alpha-blending, which works great as an attribute. This helps us to see regions of high density.

15. Hollow circles also help

Yet another way to deal with overplotting is to change the symbol to a hollow circle, which is shape 1. Both of these options help with visual communication because they aid in perception. We can more accurately and quickly see what the data is actually showing, even if the jittering adds some random noise to both axes! It's always recommended to optimize the shape, size and alpha blending of points in a scatter plot.

16. Let's practice!

Let's head over to the exercises to understand what overplotting is and how to deal with it.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.