1. More non-parametric tests: Spearman correlation
Now we're going to look at correlation tests and learn about a non-parametric test for correlation.
2. Correlation
Correlation tests allow us to determine whether the values of one continuous or ordinal variable relate to another. This tells us whether they either increase or decrease together to a statistically significant degree and to what degree variation in one will predict variation in the other.
We previously encountered the Pearson correlation test, a parametric test that examines whether a linear relationship exists between two variables.
3. Pearson vs Spearman correlation
Just as the Wilcoxon rank-sum test is the non-parametric equivalent of the t-test, the Spearman correlation test is the non-parametric equivalent of the Pearson test.
The Pearson test is based on the raw values of the data concerned, while the Spearman test deals with the ranks of the two arrays, which makes it less sensitive to outliers.
Unlike the Pearson test, the Spearman test doesn't assume that the correlation will be linear.
Where the Pearson test returns an r value, Spearman returns Spearman's rho as a measure of effect size.
Keep in mind, it does still assume that the correlation is monotonic, meaning that the direction of the correlation, whether positive or negative, remains constant.
4. Pearson vs Spearman correlation
Let's look at a few simulated distributions and see how Spearman and Pearson agree and disagree. For a simple positive linear relationship, like this one, both tests give a value of 1.
5. Pearson vs Spearman correlation
Similarly, when one variable decreases linearly as the other increases, both tests give values of minus 1.
6. Pearson vs Spearman correlation
However, when the relationship is non-linear, Spearman's test will yield higher values, since it's not based on the linear assumption of the Pearson test.
7. Pearson vs Spearman correlation
Finally, when no correlation is present, the two tests will give similar values.
8. Spearman correlation example
Let's look at the change in the heights of male Olympic athletes since 1950. This scatter plot indicates that a correlation might be present.
9. Implementing a Spearman correlation
Let's run both correlation tests. We import stats from scipy and use the pearsonr function, giving it our two arrays, as we've seen previously. Our result gives Pearson's r at index 0 and our p-value at index 1.
Implementing a Spearman correlation is very similar. We use the spearmanr function and input the same two arrays. As with the Pearson output, the measure of correlation, rho, is at index 0, while the p-value is at index 1.
Both tests have a significant result, indicated by the highly significant p-values. However, since the relationship between the variables is non-linear, the Spearman correlation is better able to capture the relationship, and returned a higher correlation coefficient than the Pearson method.
10. Let's practice!
Now let's try this out.