Generating a bootstrap distribution
The process for generating a bootstrap distribution is remarkably similar to the process for generating a sampling distribution; only the first step is different.
To make a sampling distribution, you start with the population and sample without replacement. To make a bootstrap distribution, you start with a sample and sample that with replacement. After that, the steps are the same: calculate the summary statistic that you are interested in on that sample/resample, then replicate the process many times. In each case, you can visualize the distribution with a histogram.
Here, spotify_sample
is a subset of the spotify_population
dataset. To make it easier to see how resampling works, a row ID column has been added, and only the artist name, song name, and danceability
columns have been included.
spotify_sample
is available; dplyr
and ggplot2
are loaded.
This exercise is part of the course
Sampling in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generate 1 bootstrap resample
spotify_1_resample <- ___
# See the result
spotify_1_resample