Get startedGet started for free

Non-standard estimators

In the last exercise, you ran a simple bootstrap that we will now modify for more complicated estimators.

Suppose you are studying the health of students. You are given the height and weight of 1000 students and are interested in the median height as well as the correlation between height and weight and the associated 95% CI for these quantities. Let's use bootstrapping.

Examine the pandas DataFrame df with the heights and weights of 1000 students. Using this, calculate the 95% CI for both the median height as well as the correlation between height and weight.

This exercise is part of the course

Statistical Simulation in Python

View Course

Exercise instructions

  • Use the .sample() method ondf to generate a sample of the data with replacement and assign it to tmp_df.
  • For each generated dataset in tmp_df, calculate the median heights and correlation between heights and weights using .median() and .corr().
  • Append the median heights to height_medians and correlation to hw_corr.
  • Finally calculate the 95% ([2.5, 97.5]) confidence intervals for each of the above quantities using np.percentile().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Sample with replacement and calculate quantities of interest
sims, data_size, height_medians, hw_corr = 1000, df.shape[0], [], []
for i in range(sims):
    tmp_df = ____(n=____, replace=____)
    height_medians.append(____)
    hw_corr.append(____)

# Calculate confidence intervals
height_median_ci = np.____
height_weight_corr_ci = np.____
print("Height Median CI = {} \nHeight Weight Correlation CI = {}".format( height_median_ci, height_weight_corr_ci))
Edit and Run Code