Generating a bootstrap distribution

The process for generating a bootstrap distribution is remarkably similar to the process for generating a sampling distribution; only the first step is different.

To make a sampling distribution, you start with the population and sample without replacement. To make a bootstrap distribution, you start with a sample and sample that with replacement. After that, the steps are the same: calculate the summary statistic that you are interested in on that sample/resample, then replicate the process many times. In each case, you can visualize the distribution with a histogram.

Here, spotify_sample is a subset of the spotify_population dataset. To make it easier to see how resampling works, a row ID column has been added, and only the artist name, song name, and danceability columns have been included.

spotify_sample is available; dplyr and ggplot2 are loaded.

This exercise is part of the course

Sampling in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Generate 1 bootstrap resample
spotify_1_resample <- ___


# See the result
spotify_1_resample

Edit and Run Code