Bootstrapping a confidence interval
A useful tool for assessing the variability of some data is the bootstrap. In this exercise, you'll write your own bootstrapping function that can be used to return a bootstrapped confidence interval.
This function takes three parameters: a 2-D array of numbers (data
), a list of percentiles to calculate (percentiles
), and the number of boostrap iterations to use (n_boots
). It uses the resample
function to generate a bootstrap sample, and then repeats this many times to calculate the confidence interval.
This exercise is part of the course
Machine Learning for Time Series Data in Python
Exercise instructions
- The function should loop over the number of bootstraps (given by the parameter
n_boots
) and:- Take a random sample of the data, with replacement, and calculate the mean of this random sample
- Compute the percentiles of
bootstrap_means
and return it
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.utils import ____
def bootstrap_interval(data, percentiles=(2.5, 97.5), n_boots=100):
"""Bootstrap a confidence interval for the mean of columns of a 2-D dataset."""
# Create our empty array to fill the results
bootstrap_means = np.zeros([n_boots, data.shape[-1]])
for ii in range(____):
# Generate random indices for our data *with* replacement, then take the sample mean
random_sample = ____
bootstrap_means[ii] = random_sample.mean(axis=0)
# Compute the percentiles of choice for the bootstrapped means
percentiles = ____(bootstrap_means, percentiles, axis=0)
return percentiles