The relationship of two variables
1. The relationship of two variables
So far, we have been looking at one variable at a time. In this chapter, you'll look at ways to quantify the relationship between two variables.2. Revisit the lake again
As an example, let's revisit the lake from the previous chapter, where you quantified its species richness. To further investigate the effects on species richness, you've decided to measure the distance between the lake and the nearest agricultural activity, for example crop cultivation. For this lake, you observe a species richness of 16 species, and a distance of 889 meters.3. Revisit the lake again: scatter plot
Now, you repeat this process for 30 lakes, each with their own pair of species richness versus distance. A visualization with two numeric variables plotted like this is called a scatter plot, where each point represents an observation. A scatter plot allows you to visualize the relationship between two variables. In this case, there tend to be more species when the lake is farther away from agricultural activity. The variable of interest (estimating the number of species) is placed on the y-axis by convention.4. Correlation coefficient
Describing a relationship between two variables can take many forms. The correlation coefficient is the simplest one. It is a number between -1 and 1, where the sign corresponds to the direction of the relationship, and the magnitude corresponds to the strength of the relationship. The scatter plot on the left has a correlation coefficient of 0 point 86. The data points are clearly clustered around an upward line, which indicates a strong positive correlation. The plot in the middle has a correlation coefficient of minus 0 point 52, a moderate negative correlation. The relationship is less clear, the data points are a bit more spread out. When the correlation coefficient is close to 0, such as in the last plot, there is no relationship and the scatterplot looks completely random.5. Trend lines
To further enhance the visualization, we can add a trend line. A trend line tries to summarize the relationship between two variables, so that the distances between each data point and the trend line are minimal. The closer your data points are to the line, the higher the absolute value of your correlation coefficient will be. Note that the trend line doesn't need to be a straight line.6. Trend lines: linear vs. logarithmic
Here is the same species richness versus distance scatter plot again, now with a linear trend line. The trend line tries to capture the increasing species richness when distances become longer, but it does a poor job at the start and end of the plot. It looks like species richness increases more rapidly at the lower end of the x-axis, and slower at the end. We can add a so-called logarithmic trend line to the plot instead. This line allows for the rapid increase of species richness at lower distances, and gradually increases towards higher distances. Visually, this trend line follows the data more closely, so we can say that it is a better fit. You'll assess this fit more precisely in the next lesson.7. Trend lines: predicting and extrapolating
A trend line not only describes the relationship between two variables. It also allows you to make predictions: by only measuring the distance, you can make an educated guess on how many species are in the lake, without taking actual samples! For example, the trend line predicts about 19 species for a lake that is 1400 meters away from the nearest agricultural activity. You can even make predictions outside the range of original observations. This is called extrapolation: the logarithmic model predicts around 21 species for a distance of 3000 meters, a data point that wasn't present in our original dataset.8. Other types of trend lines
Besides linear and logarithmic trend lines, there are three more trend line types that can be used in Tableau. The first is exponential, which is the inverse of a logarithmic trend line. This line describes values that tend to increase faster at the higher end of the x-axis. You also have a power or log-log trend line, used when both variables are following a logarithmic trend. Lastly, you have the polynomial trend line. It takes a degree as argument, ranging from two till eight, and can describe very complex relationships with higher degrees.9. Let's practice!
We'll come back to trend lines in the next lesson. For now, let's test whether you can guess the correlation!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.