Exercise

# Excursion: Correlation

If you're familiar with statistics, you'll have heard about Pearson's Correlation. It is a measurement to evaluate the linear dependency between two variables, say \(X\) and \(Y\). It can range from -1 to 1; if it's close to 1 it means that there is a strong positive association between the variables. If \(X\) is high, also \(Y\) tends to be high. If it's close to -1, there is a strong negative association: If \(X\) is high, \(Y\) tends to be low. When the Pearson correlation between two variables is 0, these variables are possibly independent: there is no association between \(X\) and \(Y\).

You can calculate the correlation between two vectors with the `cor()`

function. Take this code for example, that computes the correlation between the columns `height`

and `width`

of a fictional data frame `size`

:

```
cor(size$height, size$width)
```

The data you've worked with in the previous exercise, international.sav, is again available in your working directory. It's now up to import it and undertake the correct calculations to answer the following question:

*What is the correlation coefficient for the two numerical variables gdp and f_illit (female illiteracy rate)?*

Instructions

**50 XP**

##### Possible Answers

- The correlation is very close to 0. Therefore, no association is existing between female illiteracy and GDP for the data set that is used.
- The correlation is around -0.45. There is a negative correlation, but it is rather weak.
- The correlation is almost equal to +1. GDP and female illiteracy are almost perfectly, positive correlated.
- The correlation is around +0.45. There is a positive correlation, but it is rather weak.