CommencerCommencer gratuitement

Why do we need simulations?

In the last lesson, you performed a multivariate normal distribution using the mean and covariance matrix of dia. Now, you'll answer questions of interest using the simulated results!

You may ask: why do we perform simulations when we have historical data? Can't we just use the data itself to answer questions of interest?

This is a great question. Monte Carlo simulations are based on modeling using probability distributions, which yield the whole probability distribution for inspection (a large number of samples), rather than the limited number of data points available in the historical data.

For example, you can ask questions like what is the 0.1st quantile of the age variable for the diabetes patients in our simulation? We can't answer this question with the historical data dia itself: because it only has 442 records, we can't calculate what the one-thousandth value is. Instead, you can leverage the results of a Monte Carlo simulation, which you'll do now!

The diabetes dataset has been loaded as a DataFrame, dia, and the following libraries have been imported for you: pandas as pd, numpy as np, and scipy.stats as st.

Cet exercice fait partie du cours

Monte Carlo Simulations in Python

Afficher le cours

Instructions

  • Calculate the 0.1st quantile (the bottom 1,000th) of the tc variable in the simulated results.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

cov_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].cov()
mean_dia = dia[["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"]].mean()

simulation_results = st.multivariate_normal.rvs(mean=mean_dia, size=10000, cov=cov_dia)

df_results = pd.DataFrame(simulation_results, columns=["age", "bmi", "bp", "tc", "ldl", "hdl", "tch", "ltg", "glu"])

# Calculate the 0.1st quantile of the tc variable
print(____)
Modifier et exécuter le code