The approximation shortcut
1. The approximation shortcut
In the previous exercises2. Confidence Intervals
you calculated two new standard errors. One when there was less data, and the other where p-hat was low. The different values that you observed demonstrate some important properties of standard errors: they will increase when n is small and also when p is close to point-53. Approximation diagram
So far we've estimated them using the computational approach of bootstrapping.4. Approximation diagram
There is another method, however, that skips the computation entirely and relies upon an approximation.5. The normal distribution
That approximation is the normal distribution, also known as the bell curve. A useful result in mathematics says that if you have independent observations and a sufficiently large sample size, then p-hat will follow a normal distribution with a known standard deviation. This distribution is called the sampling distribution of p-hat and it's very similar to the bootstrap distribution in that it captures the variability of our estimate across many possible datasets.6. Standard deviation
This standard deviation formula, then, can be used to estimate the standard error for use in a confidence interval.7. Assessing model assumptions
When applying this result in practice, it's important to be sure that the assumptions of independence and a large sample aren't wildly off base. To assess independence, you need to consider the method by which the data was collected. A handy rule of thumb to determine if your sample size is large enough is to check that n times p-hat and n times 1 - p-hat are both greater than or equal to 10.8. Calculating standard error: approximation
OK, let's try our hand at using this shortcut to find the standard error for the proportion of people that were happy. Let's recompute p-hat, then ask the number of rows in the gss2016. That's the sample size, n. Let's check the rule-of-thumb to see if our sample size is large enough by multiplying n times p-hat and n times 1 minus p-hat. This gives 116 and 35, so our sample size should be sufficiently large. We also know that the gss uses random sampling to draw these observations, so is safe to assume that one person's answer is independent of the next. This all means that the shortcut to calculate the standard error should be a reasonably good approximation. This method gives a value of about point-0-3-4. Okay, that's the approximation approach.9. Calculating standard error: computation
How does it compare to our original computational approach using the bootstrap? Well, if we construct the bootstrap distribution for p-hat, then summarize it by finding it's standard deviation, we estimate a standard error of about point-0-3-2. Those are remarkably similar values! Let's go a step farther.10. Sampling distributions
and take a look at the shape of this bootstrap distribution. A density plot suggests that it's unimodal and symmetric. Let's add a layer to this plot that contains the normal curve that's centered at p-hat has uses the equation to find the standard deviation. And yes, let's make that curve purple. We see that the normal approximation looks fairly similar to the density curve of our bootstrap distribution. This will be a recurring theme: that when an approximation method exists, it will tend to give very similar results to the computational method when the assumptions of that approximation are reasonable.11. Sampling distributions
Let's add a layer to this plot that contains the normal curve that's centered at p-hat has uses the equation to find the standard deviation. And yes, let's make that curve purple. We see that the normal approximation looks fairly similar to the density curve of our bootstrap distribution. This will be a recurring theme: that when an approximation method exists, it will tend to give very similar results to the computational method when the assumptions of that approximation are reasonable.12. Sampling distributions
13. Let's practice!
Alright, now it's your turn to practice.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.