Excursion: Correlation
If you're familiar with statistics, you'll have heard about Pearson's Correlation. It is a measurement to evaluate the linear dependency between two variables, say \(X\) and \(Y\). It can range from -1 to 1; if it's close to 1 it means that there is a strong positive association between the variables. If \(X\) is high, also \(Y\) tends to be high. If it's close to -1, there is a strong negative association: If \(X\) is high, \(Y\) tends to be low. When the Pearson correlation between two variables is 0, these variables are possibly independent: there is no association between \(X\) and \(Y\).
You can calculate the correlation between two vectors with the cor()
function. Take this code for example, that computes the correlation between the columns height
and width
of a fictional data frame size
:
cor(size$height, size$width)
The data you've worked with in the previous exercise, international.sav, is again available in your working directory. It's now up to import it and undertake the correct calculations to answer the following question:
What is the correlation coefficient for the two numerical variables gdp
and f_illit
(female illiteracy rate)?
This exercise is part of the course
Intermediate Importing Data in R
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
