Generating a bootstrap distribution
The process for generating a bootstrap distribution is similar to the process for generating a sampling distribution; only the first step is different.
To make a sampling distribution, you start with the population and sample without replacement. To make a bootstrap distribution, you start with a sample and sample that with replacement. After that, the steps are the same: calculate the summary statistic that you are interested in on that sample/resample, then replicate the process many times. In each case, you can visualize the distribution with a histogram.
Here, spotify_sample
is a subset of the spotify_population
dataset. To make it easier to see how resampling works, a row index column called 'index'
has been added, and only the artist name, song name, and danceability
columns have been included.
spotify_sample
is available; pandas
, numpy
, and matplotlib.pyplot
are loaded with their usual aliases.
This exercise is part of the course
Sampling in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generate 1 bootstrap resample
spotify_1_resample = ____
# Print the resample
print(spotify_1_resample)