1. Inference for a mean
Great work so far! In this video, we will review Student's t-test.
The t-test helps us make inferences about a population based on a sample.
2. Inference for a mean
A lot of companies reason from samples, so interviewers might be interested in checking your understanding of statistical inference.
For example, pharmaceutical companies use statistical inference to assess the impact of a drug on all patients based on a limited number of observations.
3. Inference for a mean
In this lesson, we will make inferences about a population's mean based on a sample using a t-test. More specifically, we will calculate a confidence interval and test if the population's mean equals a given value.
4. Assumptions
The t-test assumes that the underlying data are normally distributed. Recall from the central limit theorem that the distribution of a statistic converges to a normal distribution if samples are large, even when the distribution of observations in each group is non-normal. Additionally, the t-test requires the sample to be random and the observations to be independent.
5. Confidence interval
Let's review confidence intervals. Imagine that we draw a sample from the following population.
6. Confidence interval
Knowing that the underlying data follows a normal distribution, we can calculate the confidence interval, which is the range where the population's mean lands with a given probability.
7. Confidence interval
With the increase of the sample's size, the confidence interval narrows,
8. Confidence interval
because we can estimate where the population's mean lands with the higher precision.
9. 95% confidence interval
Make sure that you can precisely define a confidence interval during the interview. Let's take a 95% confidence interval, for example. We have 100 different samples.
10. 95% confidence interval
For each of them, we compute a 95% confidence interval.
11. 95% confidence interval
We check if the confidence interval contains the true mean.
12. 95% confidence interval
Approximately 95 of the 100 confidence intervals will contain the true mean value. In practice, however, we select one random sample and generate one confidence interval, which may or may not contain the true mean.
13. One-sample t-test
We can also test if the population's mean amounts to a given value based on a sample. The null hypothesis of the one-sample t-test states that the population's mean equals a given value. The alternative hypothesis is that the two values differ.
14. t-test in R
To perform a t-test in R, you can apply the t.test function to sample data. The output of the test contains several pieces of information.
15. t-test in R
By default, the function tests if the population's mean amounts to zero. The output of the test contains the p-value. It's crucial to know how to interpret a p-value during the interview. Recall that the p-value is the probability of seeing sample data at least this extreme, assuming that the null hypothesis is true.
16. t-test in R
The function also prints out a 95% confidence interval.
17. t-test in R
You can change the hypothesized mean by setting the mu parameter.
Note that the alternative hypothesis here is that the true mean is not equal to 2 rather than to 0.
18. t-test in R
To change the level of the confidence interval, set the conf.level parameter to the chosen value. Here, we set it to 90%.
19. Summary
Let's summarize. We've covered the assumptions of a t-test, confidence intervals, the one-sample t-test, and the t.test function in R.
20. Let's practice!
Let's practice using t-tests in R!