Get startedGet started for free

Approximate sampling distributions

1. Approximate sampling distributions

In the last exercise, you may have noticed that while increasing the number of replicates didn't affect the relative error of the sample means, it did result in a more consistent shape to the distribution.

2. 4 dice

Let's consider the case of four six-sided dice rolls. We can generate all possible combinations of rolls using expand_grid from the tidyr package. There are six to the power four possible values.

3. Mean roll

Let's consider the mean of the four rolls.

4. Exact sampling distribution

Since the mean roll takes discrete values, the best way to see the distribution is to convert mean_roll to be a factor and draw a bar plot. This is the exact sampling distribution of the mean roll, because it contains every single possibility.

5. The number of outcomes increases fast

If we increase the number of dice in the scenario, the number of possible outcomes increases by a factor of six each time. With just one hundred dice, the number of outcomes is about the same as the number of atoms in the universe. Long before you start dealing with big datasets, it becomes computationally impossible to calculate the exact sampling distribution. That means we need to rely on approximations.

6. Simulating the mean of four dice rolls

We can generate a sample mean of four dice rolls using the sample function. Notice that we set replace equals TRUE to allow for the fact that several dice might have the same value.

7. Simulating the mean of four dice rolls

Then we use replicate to generate lots of sample means, in this case one thousand. The code to generate each sample mean was two lines, so we have to wrap the expr argument to replicate in braces, like you would do in a for loop or a function body. To make it suitable for ggplot, we store the result inside a tibble.

8. Approximate sampling distribution

Here's the bar plot of the sampling distribution of mean rolls again. This time, it uses the simulation rather than the exact values. It's known as the approximate sampling distribution. Notice that although it isn't perfect, it's pretty close to the exact sampling distribution. Usually, we don't have access to the whole population, so we can't calculate the exact sampling distribution. However we can feel relatively confident that using an approximation will provide a good guess as to how the sampling distribution will behave.

9. Let's practice!

Let's sample some distributions.