Exercise

The relationship between correlation and covariance matrices

Previously in the course, you used .cov() to obtain the covariance matrix and .corr() to obtain the correlation matrix. It's easy to confuse the two with each other and use them wrongly in simulations. Let's clarify!

A correlation matrix is a standardized covariance matrix, where the correlation coefficients in the correlation matrix contain values from 0 to 1.

\(cov(x,y) = corr(x,y) \times std(x) \times std(y)\)

The equation above tells us that \(cov(x,y)\), the covariance value, can be calculated by multiplying the correlation coefficient \(corr(x,y)\) with standard deviation of \(x\), \(std(x)\), and the standard deviation of \(y\), \(std(y)\). You'll test out this relationship in this exercise!

The diabetes dataset has been loaded as a DataFrame, dia, and both pandas as pd and numpy as np have been imported for you.

Instructions

100 XP
  • Calculate the covariance matrix of dia[["bmi", "tc"]], saving this as cov_dia2.
  • Calculate the correlation matrix of dia[["bmi", "tc"]], saving this as corr_dia2.