Exercise

# Correlation Strength

Intuitively, we can look at the plots provided and "see" whether the two variables seem to "vary together".

- Data Set A: x and y change together and appear to have a strong relationship.
- Data Set B: there is a rough upward trend; x and y appear only loosely related.
- Data Set C: looks like random scatter; x an y do not appear to change together and are unrelated.

Recall that deviations differ from the mean, and we normalized by dividing the deviations by standard deviation. In this exercise you will compare the 3 data sets by computing correlation, and determining which data set has the most strongly correlated variables x and y. Use the provided data table `data_sets`

, a dictionary of records, each having keys 'name', 'x', 'y', and 'correlation'.

Instructions

**100 XP**

- Complete the function definition for
`correlation()`

using the mean of the products of the normalized deviations of`x`

and`y`

. - Iterate over
`data_sets`

, computing and storing each correlation using`correlation(record['x'], record['y'])`

. - Run the code up until this point (i.e. the end of the for loop), and inspect the printout. Which dataset has the strongest correlation?
- Assign the name of the dataset (
`data_sets['A']`

,`data_sets['B']`

, or`data_sets['C']`

) with the strongest correlation to the variable`best_data`

.