Get startedGet started for free

The relationship between correlation and covariance matrices

Previously in the course, you used .cov() to obtain the covariance matrix and .corr() to obtain the correlation matrix. It's easy to confuse the two with each other and use them wrongly in simulations. Let's clarify!

A correlation matrix is a standardized covariance matrix, where the correlation coefficients in the correlation matrix contain values from 0 to 1.

\(cov(x,y) = corr(x,y) \times std(x) \times std(y)\)

The equation above tells us that \(cov(x,y)\), the covariance value, can be calculated by multiplying the correlation coefficient \(corr(x,y)\) with standard deviation of \(x\), \(std(x)\), and the standard deviation of \(y\), \(std(y)\). You'll test out this relationship in this exercise!

The diabetes dataset has been loaded as a DataFrame, dia, and both pandas as pd and numpy as np have been imported for you.

This exercise is part of the course

Monte Carlo Simulations in Python

View Course

Exercise instructions

  • Calculate the covariance matrix of dia[["bmi", "tc"]], saving this as cov_dia2.
  • Calculate the correlation matrix of dia[["bmi", "tc"]], saving this as corr_dia2.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate the covariance matrix of bmi and tc
cov_dia2 = ____

# Calculate the correlation matrix of bmi and tc
corr_dia2 = ____
std_dia2 = dia[["bmi","tc"]].std()

print(f'Covariance of bmi and tc from covariance matrix :{cov_dia2.iloc[0,1]}')
print(f'Covariance of bmi and tc from correlation matrix :{corr_dia2.iloc[0,1] * std_dia2[0] * std_dia2[1]}')
Edit and Run Code