Sampling distribution vs. bootstrap distribution

The sampling distribution and bootstrap distribution are closely linked. In situations where you can repeatedly sample from a population (these occasions are rare) and as you learn about both, it's helpful to generate both the sampling distribution and the bootstrap distribution, one after the other, to see how they are related.

Here, the statistic you are interested in is the mean popularity score of the songs.

spotify_population (the whole dataset) and spotify_sample (500 rows only representing an original sample) are available; dplyr is loaded.

1
- Generate a sampling distribution of 2000 replicates.
- Sample 500 rows of the population without replacement.
- Calculate the statistic of interest (the mean popularity) in the column mean_popularity.
- Pull out the statistic so it is a single numeric value (not a tibble).

2
- Generate a bootstrap distribution of 2000 replicates.
- Sample 500 rows of the sample with replacement.
- Calculate the statistic of interest (the mean popularity) in the column mean_popularity.
- Pull out the statistic so it is a single numeric value (not a tibble).

Introduction to Sampling

Sampling Methods

Sampling Distributions

Bootstrap Distributions

Exercise

Sampling distribution vs. bootstrap distribution

Instructions 1/2