Exercise

Correlation Strength

Intuitively, we can look at the plots provided and "see" whether the two variables seem to "vary together".

  • Data Set A: x and y change together and appear to have a strong relationship.
  • Data Set B: there is a rough upward trend; x and y appear only loosely related.
  • Data Set C: looks like random scatter; x an y do not appear to change together and are unrelated.

Data Set A

Data Set B

Data Set C

Recall that deviations differ from the mean, and we normalized by dividing the deviations by standard deviation. In this exercise you will compare the 3 data sets by computing correlation, and determining which data set has the most strongly correlated variables x and y. Use the provided data table data_sets, a dictionary of records, each having keys 'name', 'x', 'y', and 'correlation'.

Instructions

100 XP
  • Complete the function definition for correlation() using the mean of the products of the normalized deviations of x and y.
  • Iterate over data_sets, computing and storing each correlation using correlation(record['x'], record['y']).
  • Run the code up until this point (i.e. the end of the for loop), and inspect the printout. Which dataset has the strongest correlation?
  • Assign the name of the dataset (data_sets['A'], data_sets['B'], or data_sets['C']) with the strongest correlation to the variable best_data.