Get startedGet started for free

Higher dimensions

1. Higher dimensions

So far, you've seen how to visualize one or two variables. But what happens with more than two variables?

2. The UN life expectancy scatter plots

Here are the plots of life expectancy using the UN dataset. You visualized life expectancy against length of schooling and GNI per capita separately. What if you wanted to see the effect of both variables on life expectancy together?

3. 3D scatter plots

The obvious answer is to draw a 3D scatter plot. Unfortunately, that's a terrible idea. With a three-dimensional object on a two-dimensional screen, you lose all sense of perspective, and it's difficult to interpret. Sometimes, drawing the plot at different angles can assist interpretation, but most angles are unhelpful.

4. x and y are not the only dimensions

Fortunately, there are other ways of drawing more than two dimensions on a flat screen. For points in a scatter plot, you can represent values using different colors, sizes, levels of transparency, and shape.

5. Color

Here, length of schooling and GNI are shown on the x and y axes, then life expectancy is represented on a color axis. Shorter life expectancies are shown in blue, moving through green to yellow for longer life expectancies. The yellow dots are in the top right, meaning that countries with the longest life expectancies have both long schooling times and high GNI. As a bonus, you can see the positive correlation between schooling and GNI. One downside is that you can't see precise values for life expectancy. You can estimate give or take five years, but you couldn't say for definite whether a color corresponds to 64 years or 65 years.

6. Size

A second option is to change the size of the points, with larger points representing larger numbers. This is OK, but has more problems. Firstly, larger points can seem more important. Sometimes that might be acceptable, but if you want people to concentrate on countries with a low life expectancy, it's a bad choice. Secondly, it's harder to judge precise life expectancies than with color. Thirdly, the large points tend to overlap, making it difficult to distinguish individual countries.

7. Transparency

Transparency has similar problems to size. We're naturally drawn towards points with less transparency, and it's difficult to determine precise values.

8. Shape

Using different shapes is a fourth possibility. This requires cutting the range of life expectancies into groups. One shape corresponds to life expectancies between 50 and 60, another shape to life expectancies between 60 and 70, and so on. This isn't ideal because shapes have no natural ordering from smallest to largest. For example, a square isn't implicitly greater than a circle. To interpret this plot, you have to memorize which shape corresponds to which age range. This is a big mental burden.

9. Lots of panels

One final option is to draw panels for different subsets of the dataset. Like shape, you need to cut the life expectancies into categories. This is nice for determining trends across groups. In the top row, you see that the points in the 60 to 70 group are further right than the 50 to 60 group, indicating more schooling. Comparing the top and bottom rows, you see that as well as moving right, the points also move upwards, indicating more income.

10. Even more panels

In this variation, each panel contains data for an age range of five years rather than ten. This gives you more precision for life expectancy, but it takes up more space, and you spend more time having to move your eyes between panels, making interpreting harder.

11. Other dimensions for line plots

Line plots also have a choice of aesthetics to use as additional dimensions. Color is most common, but line thickness and the transparency level are options as well. These behave similarly to points. One new aesthetic is linetype. For example, you can draw lines with dashes or dots.

12. Color

Here's the plot of technology adoption in the USA, which uses color to distinguish lines.

13. Linetype

Here's an alternative version using linetype to distinguish the lines. Unfortunately, even with just four lines, it's difficult to distinguish the different types of dashes.

14. Let's practice!

Let's move to a higher dimension.