Why do we need simulations?

In the last lesson, you performed a multivariate normal distribution using the mean and covariance matrix of dia. Now, you'll answer questions of interest using the simulated results!

You may ask: why do we perform simulations when we have historical data? Can't we just use the data itself to answer questions of interest?

This is a great question. Monte Carlo simulations are based on modeling using probability distributions, which yield the whole probability distribution for inspection (a large number of samples), rather than the limited number of data points available in the historical data.

For example, you can ask questions like what is the 0.1st quantile of the age variable for the diabetes patients in our simulation? We can't answer this question with the historical data dia itself: because it only has 442 records, we can't calculate what the one-thousandth value is. Instead, you can leverage the results of a Monte Carlo simulation, which you'll do now!

The diabetes dataset has been loaded as a DataFrame, dia, and the following libraries have been imported for you: pandas as pd, numpy as np, and scipy.stats as st.

This exercise is part of the course

Monte Carlo Simulations in Python

View Course

Exercise instructions

Calculate the 0.1st quantile (the bottom 1,000th) of the tc variable in the simulated results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

cov_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].cov()
mean_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].mean()

simulation_results = st.multivariate_normal.rvs(mean=mean_dia, size=10000, cov=cov_dia)

df_results = pd.DataFrame(simulation_results, columns=["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"])

# Calculate the 0.1st quantile of the tc variable
print(____)

Edit and Run Code