1. Sensitivity analysis
Our final lesson is about sensitivity analysis.
2. Sensitivity analysis
Sensitivity analysis helps us understand the impact of the range of inputs.
By summarizing sensitivity analysis results using tables or plots, we can clearly illustrate patterns or trends.
Let's discuss a typical sensitivity analysis question about the diabetes dataset. If we increase or decrease the values for BMI and HDL using a Monte Carlo simulation, how will the predicted y values representing disease progression change?
3. Defining the parameters
Our first step in answering this question is to calculate the mean and covariance matrix for the multivariate normal simulation.
4. Defining the simulation function
We then define a simulation function called simulate_bmi_hdl which accepts mean and covariance matrices as arguments.
The code within the function is the simulation code we've seen several times.
The function returns mean predicted y values given the parameters.
We can call this function repeatedly with different parameters, changing the mean values for BMI and HDL and keeping the covariance the same to perform the sensitivity analysis.
5. Perform simulations with a range of input parameters
As an example, let's loop through a range of incremented mean HDL values and a range of incremented mean BMI values. In each iteration, we keep the same covariance matrix but change the mean values for BMI and HDL in the multivariate normal sampling. These HDL and BMI sampling values will be recorded in the hdl and bmi lists.
Simulation is performed using a mean_list variable which is created by starting with our original mean_dia and updating with the new incremented mean BMI and HDL values, leaving the other predictor means the same.
The simulate_bmi_hdl function outputs the mean predicted y values.
The output represented by mean_y is appended to the list simu_y which contains all mean predicted y values under different conditions.
Finally, we concatenate the mean incremented values of HDL, the mean incremented values of BMI, and the results of the mean predicted y values into a DataFrame called df_sa,
renaming its columns "hdl_inc", "bmi_inc" and "y".
6. Styled DataFrames of sensitivity analysis results
Let's sort df_sa by the hdl_inc and bmi_inc columns. Then, we pivot the DataFrame to generate a table, where the rows of the table represent different incremented HDL values, the columns represent different incremented BMI values, and the values in the table are the corresponding y values.
We can style the DataFrame with a background gradient: the darker the red, the bigger the numerical values of y, and the worse the patient outcome.
As HDL increases from top to bottom of each column, the red color of the "y" values becomes lighter. This indicates smaller predicted y values and better patient outcomes as HDL increases.
Increasing BMI values from left to right of each row indicates worse patient outcomes, shown with increases in predicted y values.
7. Hexbin plot for sensitivity analysis results
When there are many conditions and results, it can become difficult to examine the results in a DataFrame with only two dimensions. Instead, we can use pandas' hexbin function to examine sensitivity analysis results. The x argument represents incremented HDL values, y represents incremented BMI values, and C holds the predicted y values. reduce_C_function reduces all the values in a bin to a single number using np-dot-mean, and the gridsize defines the number of hexagons in the x-direction. The cmap value indicates color scheme, and the sharex argument needs to be set to False so that x labels show up.
In this hexbin plot, the darker the color, the smaller the numeric values of the predicted y. The plot shows the same conclusions as the DataFrame: with increasing HDL values there is a decrease in predicted y values and therefore better patient outcomes; with increasing BMI values, there are worse patient outcomes.
8. Hexbin plot for dense parameter space
Our sensitivity analysis looked at only a few different possible values for BMI and HDL. If we were to look at a greater number of possible values, the pattern becomes even more clear with the denser parameter space.
9. Let's practice!
Let's practice our sensitivity analysis!