1. Bootstrap confidence intervals
In the last video, we learned how to take a set of data, create a bootstrap sample, and then compute a bootstrap replicate of a given statistic. Since we will repeat the replicates over and over again, we can write a function to generate a bootstrap replicate.
2. Bootstrap replicate function
We will call the function bootstrap_replicate_1d, since it works on one-dimensional arrays. We pass in the data and also a function that computes the statistic of interest. We could pass np dot mean or np dot median, for example. Generating a replicate takes two steps. First, we choose entries out of the data array so that the bootstrap sample has the same number of entries as the original data. Then, we compute the statistic using the specified function. If we call the function, we get a bootstrap replicate. And we can do this over and over again. So, how do we do it over and over again?
3. Many bootstrap replicates
With a for loop! First, we have to initialize an array to store our bootstrap replicates. We will make 10,000 replicates, so we use np dot empty to create an empty array. Next, we write a for loop to generate a replicate and store it in the bs_replicates array. Now that we have the replicates,
4. Plotting a histogram of bootstrap replicates
we can make a histogram to see what we might expect to get for the mean of repeated measurements of the speed of light. Note that we use the normed equals True keyword argument. This sets the height of the bars of the histogram such that the total area of the bars is equal to one. This is called
5. Bootstrap estimate of the mean
normalization, and we do it so that the histogram approximates a probability density function. You'll recall from the prequel to this course that the area under the PDF gives a probability. So, we have computed the approximate PDF of the mean speed of light we would expect to get if we performed the measurements again. Now we're thinking probabilistically! If we repeat the experiment again and again, we are likely to only see the sample mean vary by about 30 km/s. Now it is useful to summarize this result without having to resort to a graphical method like a histogram. To do this,
6. Confidence interval of a statistic
we will compute the 95% confidence interval of the mean. The p% confidence interval is defined as follows. If we repeated measurements over and over again, p% of the observed values would lie within the p% confidence interval. In our case, if we repeated the 100 measurements of the speed of light over and over again, 95% of the sample means would lie within the 95% confidence interval.
7. Bootstrap confidence interval
By doing bootstrap replicas, we just "repeated" the experiment over and over again. So, we just use np dot percentile to compute the 2-point-5th and 97-point-5th percentiles to get the 95% confidence interval. This is indeed commensurate with what we see in the histogram.
8. Let's practice!
Now it's time for you get some of your own bootstrap confidence intervals.