1. Normality tests
Hi all, and welcome to chapter 3. In this chapter, we will review statistical tests, starting with normality tests.
2. Normal distribution
During an interview, you might be asked to carry out a statistical test. Normality is an assumption for many statistical tests, such as the t-test. You should prove that you're aware of these assumptions and that you can test whether they hold true.
3. Testing normality
How can we know that a given sample comes from a normally distributed population? To answer this question, you may consider statistical tests, such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test. You can also use a visual measure, such as the Q-Q plot.
4. Shapiro-Wilk test
The Shapiro-Wilk test checks if a random sample comes from a normally distributed population. The null hypothesis of the Shapiro-Wilk test states that the data is normally distributed. The alternative hypothesis states that the data is not normally distributed. You can assess the hypothesis based on the p-value.
5. Shapiro-Wilk test
We review the Shapiro-Wilk test rather than other tests because it has been concluded that this test has the best power for a given significance.
6. P-value
Let's take a quick break to review the p-value, or the probability value.
The p-value is a tool to challenge the null hypothesis.
Its value expresses the probability of observing what we've observed, assuming that the null hypothesis is true.
7. Shapiro-Wilk test
Commonly, when the p-value is less than 5%, the null hypothesis gets rejected. When performing a test of normality, rejecting the null hypothesis means that there is evidence that the data is not from a normally distributed population.
8. Kolmogorov-Smirnov test
The Kolmogorov–Smirnov test checks if a sample distribution fits a cumulative distribution function of a referenced distribution.
9. Kolmogorov-Smirnov test
If the referenced distribution is normal, the test helps to assess the normality of a dataset.
10. Kolmogorov-Smirnov test
The null hypothesis of the Kolmogorov-Smirnov test states that the sample distribution is identical to the other distribution.
11. Q-Q plot
To check the normality of data, we can also use visual measures such as a Q-Q plot. A Q-Q plot determines if two datasets come from a population with a common distribution.
12. Q-Q plot
The Q–Q plot compares a statistical population on the horizontal axis
13. Q-Q plot
to a sample of data on the vertical axis.
14. Q-Q plot
If the two distributions that we compare are from the same distribution, the points in the Q-Q plot will approximately lie on this line.
It's a good idea to combine various methods, such as statistical tests and visual measures, to ensure the correctness of the result.
15. Transforming data for normality
If you need normally distributed data, but yours are not, you can try to transform it.
16. Transforming data for normality
If the distribution of your data is skewed, you can apply a logarithm to it to get a bell-shaped distribution.
17. Checking normality in R
To carry out the Shapiro-Wilk test in R, use the shapiro.test function.
The ks.test function performs the Kolmogorov-Smirnov test. You need to set the y parameter to pnorm to perform the test against the normal distribution.
To draw a Q-Q plot, you can use the qqnorm function. qqline adds a theoretical line to the plot.
18. Summary
To summarize, we've covered two normality tests: the Shapiro-Wilk test and the Kolmogorov-Smirnov test. On the way, we reviewed what a p-value is. We also talked about Q-Q plots, data transformation, and how to check normality in R.
19. Let's practice!
Now it's time for you to practice!