Non-standard estimators
In the last exercise, you ran a simple bootstrap that we will now modify for more complicated estimators.
Suppose you are studying the health of students. You are given the height and weight of 1000 students and are interested in the median height as well as the correlation between height and weight and the associated 95% CI for these quantities. Let's use bootstrapping.
Examine the pandas
DataFrame df
with the heights and weights of 1000 students. Using this, calculate the 95% CI for both the median height as well as the correlation between height and weight.
This exercise is part of the course
Statistical Simulation in Python
Exercise instructions
- Use the
.sample()
method ondf
to generate a sample of the data with replacement and assign it totmp_df
. - For each generated dataset in
tmp_df
, calculate the median heights and correlation between heights and weights using.median()
and.corr()
. - Append the median heights to
height_medians
and correlation tohw_corr
. - Finally calculate the 95% (
[2.5, 97.5]
) confidence intervals for each of the above quantities usingnp.percentile()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Sample with replacement and calculate quantities of interest
sims, data_size, height_medians, hw_corr = 1000, df.shape[0], [], []
for i in range(sims):
tmp_df = ____(n=____, replace=____)
height_medians.append(____)
hw_corr.append(____)
# Calculate confidence intervals
height_median_ci = np.____
height_weight_corr_ci = np.____
print("Height Median CI = {} \nHeight Weight Correlation CI = {}".format( height_median_ci, height_weight_corr_ci))