Get startedGet started for free

Calculating correlation coefficients

1. Calculating correlation coefficients

We will analyze relationships between variables through the correlation coefficients and visually with Seaborn heatmaps.

2. What is a correlation coefficient?

But first, what is a correlation coefficient? This is a numerical measure of the statistical relationship between two variables. In other words, how one variable changes as the other changes. This measure ranges from negative one to one. Negative one indicates a strong, negative relationship; in other words, as one variable increases, the other decreases. One indicates a strong, positive relationship; in other words, as one variable increases, the other also increases. Zero indicates no relationship; in other words, the change of one variable is not related to the change of the other.

3. Correlation coefficient example #1

Returning to some examples we saw in previous videos, this scatter plot, with a positive relationship, has a correlation of zero point eight.

4. Correlation coefficient example #2

This scatter plot with a negative relationship has a correlation of negative zero point eight.

5. Correlation coefficient example #3

Finally, this scatter plot, which showed more of a random blob of data points, has a correlation of zero.

6. Correlation matrix

Sometimes we want to view not only the correlation of one pair of variables but every pairwise combination, similar to a pair plot visualization. To do this, we can create a correlation matrix. A correlation matrix is simply a table with an equal number of columns and rows, one for every variable in the dataset. Each cell in the table shows the correlation between the corresponding column variable and row variable.

7. Correlation matrix

For example, here are the correlations in the Income row.

8. Correlation matrix

And here is Income vs MntWines, which is zero point 73.

9. Correlation heatmap

A correlation matrix can be converted into a heatmap visualization by using seaborn dot heatmap(). This function takes a correlation matrix as the first argument, which can be created using the dot corr() method on a DataFrame. We can set “annot” to “True” to include annotations. Here is an example using the correlation matrix we saw in the previous slide.

10. Correlation heatmap example

A correlation heatmap is fantastic for a quick view of which variables are highly correlated with each other. The color scale illuminates this by showing a weaker relationship with a darker color and a stronger one with a lighter color.

11. Correlation heatmap example

The diagonal of the correlation matrix will be all one's as this is the variable correlated with itself.

12. Correlation does not mean causation

I would be remiss not to mention that correlation does not mean causation. You may have heard this before, but if you haven’t, it simply means that because two variables have a strong correlative relationship, we cannot conclude that one causes the other. To make this causative statement, we need to perform more rigorous experimentation, which this course will not cover. Here is an example. From the chart, we can see that the amount of oil produced in the United States increases and decreases in accordance with the number of songs in the Rolling Stone Greatest Songs of All Time. Visually we can see there is a relationship. Obviously, this is due to coincidence rather than one thing causing the other.

13. Let's practice!

Let's see how to calculate the correlation coefficient in Power BI and Python!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.