The relationship between correlation and covariance matrices
Previously in the course, you used .cov()
to obtain the covariance matrix and .corr()
to obtain the correlation matrix. It's easy to confuse the two with each other and use them wrongly in simulations. Let's clarify!
A correlation matrix is a standardized covariance matrix, where the correlation coefficients in the correlation matrix contain values from 0 to 1.
\(cov(x,y) = corr(x,y) \times std(x) \times std(y)\)
The equation above tells us that \(cov(x,y)\), the covariance value, can be calculated by multiplying the correlation coefficient \(corr(x,y)\) with standard deviation of \(x\), \(std(x)\), and the standard deviation of \(y\), \(std(y)\). You'll test out this relationship in this exercise!
The diabetes dataset has been loaded as a DataFrame, dia
, and both pandas
as pd
and numpy
as np
have been imported for you.
This exercise is part of the course
Monte Carlo Simulations in Python
Exercise instructions
- Calculate the covariance matrix of
dia[["bmi", "tc"]]
, saving this ascov_dia2
. - Calculate the correlation matrix of
dia[["bmi", "tc"]]
, saving this ascorr_dia2
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the covariance matrix of bmi and tc
cov_dia2 = ____
# Calculate the correlation matrix of bmi and tc
corr_dia2 = ____
std_dia2 = dia[["bmi","tc"]].std()
print(f'Covariance of bmi and tc from covariance matrix :{cov_dia2.iloc[0,1]}')
print(f'Covariance of bmi and tc from correlation matrix :{corr_dia2.iloc[0,1] * std_dia2[0] * std_dia2[1]}')