Correlation Strength

Intuitively, we can look at the plots provided and "see" whether the two variables seem to "vary together".

Data Set A: x and y change together and appear to have a strong relationship.
Data Set B: there is a rough upward trend; x and y appear only loosely related.
Data Set C: looks like random scatter; x an y do not appear to change together and are unrelated.

Data Set A

Data Set B

Data Set C

Recall that deviations differ from the mean, and we normalized by dividing the deviations by standard deviation. In this exercise you will compare the 3 data sets by computing correlation, and determining which data set has the most strongly correlated variables x and y. Use the provided data table data_sets, a dictionary of records, each having keys 'name', 'x', 'y', and 'correlation'.

Complete the function definition for correlation() using the mean of the products of the normalized deviations of x and y.
Iterate over data_sets, computing and storing each correlation using correlation(record['x'], record['y']).
Run the code up until this point (i.e. the end of the for loop), and inspect the printout. Which dataset has the strongest correlation?
Assign the name of the dataset (data_sets['A'], data_sets['B'], or data_sets['C']) with the strongest correlation to the variable best_data.

script.py

IPython Shell

Exploring Linear Trends

Building Linear Models

Making Model Predictions

Estimating Model Parameters

Exercise

Exercise

Correlation Strength

Instructions