1. Creating a sampling distribution
We just saw how point estimates like the sample mean will vary depending on which rows end up in the sample.
2. Same code, different answer
For example, this same code to calculate the mean cup points from a simple random sample of thirty coffees gives a slightly different answer each time. Let's try to visualize and quantify this variation.
3. Same code, 1000 times
A for loop lets us run the same code many times. It's especially useful for situations like this where the result contains some randomness.
We start by creating an empty list to store the means. Then, we set up the for loop to repeatedly sample 30 coffees from coffee_ratings a total of 1000 times, calculating the mean cup points each time. After each calculation, we append the result, also called a replicate, to the list.
Each time the code is run, we get one sample mean, so running the code a thousand times generates a list of one thousand sample means.
4. Distribution of sample means for size 30
The one thousand sample means form a distribution of sample means. To visualize a distribution, the best plot is often a histogram.
Here we can see that most of the results lie between eighty-one and eighty-three, and they roughly follow a bell-shaped curve, like a normal distribution.
There's an important piece of jargon we need to know here. A distribution of replicates of sample means, or other point estimates, is known as a sampling distribution.
5. Different sample sizes
Here are histograms from running the same code again with different sample sizes. When we decrease the original sample size of thirty to six, we can see from the x-values that the range of the results is broader. The bulk of the results now lie between eighty and eighty-four.
On the other hand, increasing the sample size to one hundred and fifty results in a much narrower range. Now most of the results are between eighty-one-point-eight and eighty-two-point-six.
As we saw previously, bigger sample sizes give us more accurate results. By replicating the sampling many times, as we've done here, we can quantify that accuracy.
6. Let's practice!
Ready to replicate?