Get startedGet started for free

Normality tests

1. Normality tests

Hi all, and welcome to chapter 3. In this chapter, we will review statistical tests, starting with normality tests.

2. Normal distribution

During an interview, you might be asked to carry out a statistical test. Normality is an assumption for many statistical tests, such as the t-test. You should prove that you're aware of these assumptions and that you can test whether they hold true.

3. Testing normality

How can we know that a given sample comes from a normally distributed population? To answer this question, you may consider statistical tests, such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test. You can also use a visual measure, such as the Q-Q plot.

4. Shapiro-Wilk test

The Shapiro-Wilk test checks if a random sample comes from a normally distributed population. The null hypothesis of the Shapiro-Wilk test states that the data is normally distributed. The alternative hypothesis states that the data is not normally distributed. You can assess the hypothesis based on the p-value.

5. Shapiro-Wilk test

We review the Shapiro-Wilk test rather than other tests because it has been concluded that this test has the best power for a given significance.

6. P-value

Let's take a quick break to review the p-value, or the probability value. The p-value is a tool to challenge the null hypothesis. Its value expresses the probability of observing what we've observed, assuming that the null hypothesis is true.

7. Shapiro-Wilk test

Commonly, when the p-value is less than 5%, the null hypothesis gets rejected. When performing a test of normality, rejecting the null hypothesis means that there is evidence that the data is not from a normally distributed population.

8. Kolmogorov-Smirnov test

The Kolmogorov–Smirnov test checks if a sample distribution fits a cumulative distribution function of a referenced distribution.

9. Kolmogorov-Smirnov test

If the referenced distribution is normal, the test helps to assess the normality of a dataset.

10. Kolmogorov-Smirnov test

The null hypothesis of the Kolmogorov-Smirnov test states that the sample distribution is identical to the other distribution.

11. Q-Q plot

To check the normality of data, we can also use visual measures such as a Q-Q plot. A Q-Q plot determines if two datasets come from a population with a common distribution.

12. Q-Q plot

The Q–Q plot compares a statistical population on the horizontal axis

13. Q-Q plot

to a sample of data on the vertical axis.

14. Q-Q plot

If the two distributions that we compare are from the same distribution, the points in the Q-Q plot will approximately lie on this line. It's a good idea to combine various methods, such as statistical tests and visual measures, to ensure the correctness of the result.

15. Transforming data for normality

If you need normally distributed data, but yours are not, you can try to transform it.

16. Transforming data for normality

If the distribution of your data is skewed, you can apply a logarithm to it to get a bell-shaped distribution.

17. Checking normality in R

To carry out the Shapiro-Wilk test in R, use the shapiro.test function. The ks.test function performs the Kolmogorov-Smirnov test. You need to set the y parameter to pnorm to perform the test against the normal distribution. To draw a Q-Q plot, you can use the qqnorm function. qqline adds a theoretical line to the plot.

18. Summary

To summarize, we've covered two normality tests: the Shapiro-Wilk test and the Kolmogorov-Smirnov test. On the way, we reviewed what a p-value is. We also talked about Q-Q plots, data transformation, and how to check normality in R.

19. Let's practice!

Now it's time for you to practice!