Get startedGet started for free

Inputs with correlations

1. Inputs with correlations

Once we've chosen probability distributions for our variables, how do we simulate variables with correlations? We'll draw on our knowledge of multivariate random variables from the last chapter to answer this question.

2. Parameters for a Multivariate normal simulation

Based on our MLE calculations in the last lesson, we chose the normal distribution for simulating the predictor variables in the diabetes dataset. Since we also know that several variables in the diabetes dataset are correlated, a multivariate normal distribution is suitable for our Monte Carlo simulation. We need two sets of parameters for a multivariate normal Monte Carlo Simulation: the mean of each variable and the covariance matrix.

3. Parameters for a Multivariate normal simulation

We can use the mean function in pandas to calculate the mean for each variable, saving the results as mean_dia.

4. Parameters for a Multivariate normal simulation

And we'll use the cov function to calculate the covariance of the predictor variables, saving the results as cov_dia.

5. Code for simulation

After obtaining the two sets of parameters in mean_dia and cov_dia, we provide them as parameter values to st-dot-multivariate_normal-dot-rvs to perform a Monte Carlo simulation using the multivariate normal distribution. Here, we sample 2,000 times and then save the results into a DataFrame called df_results.

6. Pairplot of simulation results

Let's examine the simulated data with pairplots. Our simulated data is plotted on the left and the historical data on the right.

7. Pairplot of simulation results

Take a look at pairwise scatterplot of the fourth and fifth variables, which correspond to tc and ldl respectively. There is a clear positive correlation between them, preserving what we saw in the original data. Compared to the historical data on the right, the strong patterns are preserved in the simulation results, and we have many more data points in the simulation results because we are simulating from full probability distributions. In theory, we can generate an infinite number of samples, which is very useful!

8. Calculate the predicted y

We only simulate predictor variables in Monte Carlo simulations. So what about the y value, the response variable? We have a predictive regression model already built called "regr_model" based on the diabetes dataset, which provides predictions given inputs. This is a deterministic model, meaning that given the same input, it will always give us the same prediction. This is a regular deterministic calculation step. We do not always need a regression model or any model for the deterministic calculation. If we had a formula for the calculation of y, we could use that instead. Feeding the simulated data to regr_model,the model yields predicted y values for each row in the df_results DataFrame. The first five items of the predicted y values are shown as examples.

9. Histogram of the predicted y

Now, if we draw the histogram of the predicted y values, this is what we will see. The bigger the y value, the more severe the disease progression is.

10. Simulated predictors + predicted response

Next, let's combine the simulated predictors and the predicted response into one DataFrame by adding a predicted_y column containing our predictions.

11. Recap

Let's review the Monte Carlo simulation steps now that we've done a more advanced simulation! First, we selected the variables of interest and decided to use the multivariate normal distribution, based on our MLE evaluation. We used the mean and covariance matrix of the historical diabetes dataset to simulate input variables and confirmed that the simulated results of these variables indeed looked like the historical data. We then performed a deterministic calculation to obtain the predicted y values with the help of regr_model. In the next lesson, we'll look at how to answer questions of interest using our Monte Carlo simulations!

12. Let's practice!

But first, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.