Jackknife resampling

1. Jackknife resampling

Next we turn our attention to jackknife resampling. The jackknife estimation process was developed before bootstrapping but is quite similar. John Tukey proposed the name jackknife because he saw this procedure as a quick tool that could be applied to a variety of problems. It is particularly useful when the underlying distribution of the data is unknown. Just like bootstrapping, you create multiple samples from the original dataset and calculate your statistic for each of the samples. What differs is how these samples are created. To better understand this resampling process, let's return to the Easter eggs example.

2. Easter eggs

Let's recollect the easter eggs example. You've received a large shipment of easter eggs and are interested in determining the average weight of each egg for quality control. You have access to a small sample of 10 eggs. You weigh these eggs and find 4 that weigh 20g, 3 that weigh 70g and 3 others weighing 50g, 90g and 80g respectively.

3. Easter eggs

Just like we did in the previous video, you can easily calculate the mean of 51, standard deviation of 27, standard error of 8.53 and then multiply this standard error by 1.96 to get the 95% confidence interval between 34.27 and 67.73. This time, let's see how jackknife resampling helps us attack this problem from a slightly different angle.

4. Jackknifing Easter eggs

Our overall approach is very similar to that in bootstrap sampling - we generate multiple datasets, calculate the quantity of interest in each sample, and then aggregate those to get an overall estimate. While in bootstrapping we used sampling with replacement, here we use leave one out resampling. Consider this figure with three possible samples drawn from the original sample. The first one has 9 eggs with the 50g egg removed, the second one has the 90g egg removed while the third one leaves out the 80g egg. This process is repeated till we have every sample where one egg is left out.

5. Jackknife estimate

Although this slide looks math heavy, it's actually very straightforward. The first equation basically tells us that theta jackknife, the jackknife estimate of your quantity of interest, is the mean of the values you get from each of the jackknife samples where you leave out one observation. The variance of the jackknife estimate looks very much like the variance of the values of the quantity from each jackknife sample, expect we have an n minus one term in the numerator. Without going into the math behind it, we mainly need it because each of the jackknife samples only has n-1 observations, so this term helps correct for that bias introduced.

6. Jackknife vs Bootstrap

Returning to the Easter eggs example and applying these equations, we get a jackknife estimate for the mean of 51g with a confidence interval between 33.36g and 68.64g. This is pretty close to the bootstrapped mean of 50.8g.

7. Let's practice!

Now that we have an understanding of how jackknife compares to bootstrapping, let's work on some examples.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Statistical Simulation in Python

IntermediateSkill Level

4.9+

16 reviews