Estimating a mean with a t-interval

1. Estimating with the t-interval

In this video we learn how to construct confidence intervals for a population mean using the t-distribution.

2. Quantifying variability of sample means

Let's start with a motivating question: Suppose among a random sample of 100 people 13 are left handed. If you were to select another random sample of 100, would you be surprised if only 12 are left handed? Probably not... What about 15? Again, probably not... or 30? Hard to tell... What about 1 or 90? We can probably agree tat these look quite unlikely. Clearly, we need a way to quantify how much variability is expected between samples. Bootstrapping is one way to quantify how much sample means vary from one sample to the other. Another approach is to use theory, or specifically the Central Limit Theorem, to approximate how much sample means vary from one sample to the other. This is the focus of this video.

3. Central Limit Theorem

The Central Limit Theorems states the distribution of sample means is nearly normal, centered at the population mean, and with a standard error equal to the population standard deviation divided by the square root of the sample size. The standard error is defined as the standard deviation of the sampling distribution, which is the distribution of sample means of samples from the original population. Note, this is a theoretical distribution that we in reality can't generate because we don't have the luxury of going back to the original population to take new samples. It's similar in spirit to the bootstrap distribution but it's different as the samples are not resampled from the original sample, but instead the population. Also, since the population standard deviation is often unknown, in practice, the standard error is estimated by the standard deviation of the sample divided by the square root of the sample size. And this additional uncertainty introduced by using the sample standard deviation is mitigated by using a t-distribution with degrees of freedom n - 1 since compared to the normal distribution the t-distribution is more conservative with thicker tails. And as with most theorems, this one also only applies if certain condition are met.

4. Conditions

Here are the conditions: First, the obsevations in our sample should be independent of each other with respect to the response variable. This is difficult to verify. However if the study employs random sampling and/or random assignment, And for studies that employ random sampling without replacement -- which is almost all polls and surveys -- the sample size is less than 10% of the population, we can be fairly certain that the observations in the sample are independent of each other. Second, the more skewed the original population distribution, the larger a sample size we need. So it's important to plot the distribution of the sample, which is the best window we have into the unknown population, and verify the skewness of the distribution.

5. Confidence interval for a mean

Let's use data from the 2010 General Social Survey to estimate the number of days per month Americans work extra hours beyond their usual schedule. The data are stored in a dataframe called gss and the variable of interest is called moredays.

6. Confidence interval for a mean

We can create the confidence interval using the t-dot-test function -- don't worry about the word "test" for now -- by using the variable as the first argument and the confidence level as the second argument. The function output has some extraneaus information, but what we need is listed right beneath "95 percent confidence interval". Based on this output, we are 95% confident that the average number of days per month Americans work extra hours beyond their usual schedule is 5.27 to 6.15. That's roughly over 25% of work days in a typical month!

7. Let's practice!

Time to put this into practice.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.