Introduction to resampling methods

1. Introduction to resampling methods

Welcome to this chapter where we will study resampling methods, an extremely useful class of statistical data analysis techniques which are closely related to the concept of statistical simulation. They have wide applications in areas like model validation, uncertainty estimation and significance testing. The underlying concept is that you simulate multiple instances of your dataset by resampling it. Let's look at how this is done in practice.

2. Resampling workflow

In a typical resampling workflow, we start with a dataset. We then apply a resampling method to create a new dataset. We then analyze this dataset to get an estimate for some quantity of interest. We then repeat this process multiple times to get multiple values for that estimator or quantity of interest. And it's as simple as that! No complicated statistical formulae to remember. In general, this simple workflow is how we use resampling methods for data analysis. Let's dig into some advantages and drawbacks.

3. Why resample?

At a very basic level, resampling methods are quite attractive due to their simplicity. They are conceptually quite simple to implement and are applicable to complex estimators. For example, there's a statistical formula for estimating the confidence interval of a population mean, but how about the confidence interval for the 35th percentile of the population distribution? Resampling methods easily allow such estimations. In general, resampling methods don't make any strict assumptions regarding the distribution of the data. The drawback of using resampling methods, of course, is that they tend to be computationally expensive. However, with the advent of more powerful computers, this has become less of an issue in recent years. Let's look at the three major types of resampling methods.

4. Types of resampling methods

Generally speaking, there are three broad types of resampling methods. Bootstrap resampling is the most common type of resampling method. Here we sample from the dataset repeatedly but with replacement. Jackknife resampling is very similar to bootstrapping, except that there is no random sampling. Instead, one or more observations from the original dataset are systematically excluded in creating new datasets. Jackknife resampling methods are quite useful for estimating the bias and variance of estimators. Although jackknife methods were developed before bootstrapping, they can be seen as a linear approximation of bootstrapping. Finally, we have permutation testing, which involves switching the labels in the dataset.

5. Let's practice!

We will dive deeper into each of these areas in the coming videos. First, let's do a simple review of sampling.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Statistical Simulation in Python

IntermediateSkill Level

4.9+

16 reviews