CommencerCommencer gratuitement

The relationship between correlation and covariance matrices

Previously in the course, you used .cov() to obtain the covariance matrix and .corr() to obtain the correlation matrix. It's easy to confuse the two with each other and use them wrongly in simulations. Let's clarify!

A correlation matrix is a standardized covariance matrix, where the correlation coefficients in the correlation matrix contain values from 0 to 1.

\(cov(x,y) = corr(x,y) \times std(x) \times std(y)\)

The equation above tells us that \(cov(x,y)\), the covariance value, can be calculated by multiplying the correlation coefficient \(corr(x,y)\) with standard deviation of \(x\), \(std(x)\), and the standard deviation of \(y\), \(std(y)\). You'll test out this relationship in this exercise!

The diabetes dataset has been loaded as a DataFrame, dia, and both pandas as pd and numpy as np have been imported for you.

Cet exercice fait partie du cours

Monte Carlo Simulations in Python

Afficher le cours

Instructions

  • Calculate the covariance matrix of dia[["bmi", "tc"]], saving this as cov_dia2.
  • Calculate the correlation matrix of dia[["bmi", "tc"]], saving this as corr_dia2.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Calculate the covariance matrix of bmi and tc
cov_dia2 = ____

# Calculate the correlation matrix of bmi and tc
corr_dia2 = ____
std_dia2 = dia[["bmi","tc"]].std()

print(f'Covariance of bmi and tc from covariance matrix :{cov_dia2.iloc[0,1]}')
print(f'Covariance of bmi and tc from correlation matrix :{corr_dia2.iloc[0,1] * std_dia2[0] * std_dia2[1]}')
Modifier et exécuter le code