Line plots

1. Line plots

A close relative to the scatter plot is the line plot.

2. Worldwide COVID-19 coronavirus cases

Here is a mildly terrifying scatter plot of the cumulative number of cases of COVID-19 coronavirus throughout the world in early 2020. It's OK, but we can make a better plot. Since the x-axis consists of dates, consecutive data points are connected. That means that the plot is easier to understand if we connect those data points. That is, using a line is preferable to points here.

3. When should you use a line plot?

So the use case for line plots is the same as for scatter plots, with one extra condition: consecutive data points are connected in some way. The most common case is that the x-axis represents dates or times, like you just saw with the coronavirus cases, but that isn't always the case.

4. Comparing multiple lines

One really useful thing about line plots is that you can draw multiple lines on the same plot and compare them. Here you can see that in February, the majority of cases of COVID-19 were reported in China, then in March the rest of the world overtook it.

5. Trend lines

One useful thing to do is to compare line plots to a trend line generated from a linear regression. This plot shows the cases reported in China in March, after quarantines and other restrictions were in place. The data line closely follows the straight trend line, indicating that the number of cases is growing linearly.

6. Trend lines + log scale

By contrast, if we look at the cases outside of China, a linear trend is a really poor fit. The number of cases is growing much faster than that. By transforming the plot to use a logarithmic scale on the y-axis before fitting the trend line, you see a much better fit. That means that on a worldwide level, the number of cases is growing exponentially.

7. Time x-axis doesn't always imply line plot

Just because you have a dataset where you want to look at the relationship between a numeric variable and time, it doesn't mean a line plot is the best choice. Here you can see the results of a poll by the BBC, where critics were asked to score the best hip-hop songs. In the plot, each point represents a song. Date is on the x-axis and the critic's score is on the y-axis. Because each song is not connected to the next song, you can't draw a line between them, and a scatter plot makes more sense. To reiterate: line plots need consecutive data values to be conceptually connected.

8. Time x-axis doesn't always imply line plot

In case you were wondering, the top-rated hip-hop song of all time was Juicy by The Notorious BIG.

9. Time x-axis doesn't always imply line plot

Conversely, you don't always need dates or times on the x-axis. Here you can see a plot of juvenile offenders in Switzerland. The x-axis is time, and the y-axis is the number of offenders, and each line represents an age group. Somehow, I find this plot be be very unsatisfying for getting insights. In a later exercise, you'll see an alternative approach without a time x-axis that works better.

10. Let's practice!

Time to draw some line plots!