Get startedGet started for free

Pairs bootstrap

1. Pairs bootstrap

When we computed bootstrap confidence intervals on summary statistics, we did so

2. Nonparametric inference

nonparametrically. By this, I mean that we did not assume any model underlying the data; the estimates were done using the data alone.

3. 2008 US swing state election results

When we performed a linear least squares regression, however, we were using a linear model, which has two parameters, the slope and intercept. This was a parametric estimate. The optimal parameter values we compute for our parametric model are like other statistics, in that we would get different values for them if we acquired the data again. We can perform bootstrap estimates to get confidence intervals on the slope and intercept as well. Remember: we need to think probabilistically. Let's consider the swing state election data from the prequel to this course. What if we had the election again, under identical conditions? How would the slope and intercept change? This is kind of a tricky question; there are several ways to get bootstrap estimates of the confidence intervals on these parameters, each of which makes difference assumptions about the data. We will do a method that makes the least assumptions,

4. Pairs bootstrap for linear regression

called pairs bootstrap. Since we cannot resample individual data because each county has two variables associated with it, the vote share for Obama and the total number of votes, we resample pairs. For the election data, we could randomly select a given county, and keep its total votes and Democratic share as a pair. So our bootstrap sample consists of a set (x,y) pairs. We then compute the slope and intercept from this pairs bootstrap sample to get the bootstrap replicates. You can get confidence intervals from many bootstrap replicates of the slope and intercept, just like before. Let's see how this works in practice.

5. Generating a pairs bootstrap sample

Because np dot random dot choice must sample a 1D array, we will sample the indices of the data points. We can generate the indices of a NumPy array using the np dot arrange function. It give us an array of sequential integers. We then sample the indices with replacement. The bootstrap sample is generated by slicing out the respective values from the original data arrays. With these in hand,

6. Computing a pairs bootstrap replicate

we can perform a linear regression using np dot polyfit on the pairs bootstrap sample to get a bootstrap replicate. If we compare the result to the linear regression on the original data, they are close, but not equal. As we have seen before, you can use many of these replicates to generate bootstrap confidence intervals for the slope and intercept using np dot percentile. You can also

7. 2008 US swing state election results

plot the lines you get from your bootstrap replicates to get a graphic idea how the regression line may change if the data were collected again. You will work through this whole procedure in the exercises.

8. Let's practice!

When you do, always keep in mind that you are thinking probabilistically. Getting an optimal parameter value is the first step. Now, you are finding out how that parameter is likely to change upon repeated measurements. Happy coding!