Get startedGet started for free

Descriptive statistics

1. Descriptive statistics

Welcome to the second chapter of the course!

2. Descriptive statistics

In the last chapter, we covered probability distributions. Now, we will brush up our exploratory data analysis skills. Let's start with descriptive statistics.

3. Descriptive statistics

Interviewers may be eager to check if you can quickly familiarize yourself with a new dataset at hand. Descriptive statistics help to understand the main features of data.

4. Descriptive statistics

During interviews, two main types of statistical measures pop up: central tendency measures and variability measures.

5. Descriptive statistics

Let's start by reviewing central tendency measures.

6. Central tendency measures

The most common central tendency measures are mean and median. You may encounter mode as well. In the example, we have a sample of 6 values arranged from the lowest to the highest.

7. Central tendency measures

The mean is the sum of the values divided by the number of values.

8. Central tendency measures

The median is the value separating the upper half from the lower half of a sample. In sets of data with an odd number, the median is the middle value.

9. Central tendency measures

If you have 5 values and put them in order,

10. Central tendency measures

the median will be the third number

11. Central tendency measures

since two numbers are lower than it

12. Central tendency measures

and two numbers are higher than it.

13. Central tendency measures

If there is an even number of observations in a data set, then there is no single middle value. The median is the mean of the two middle values.

14. Central tendency measures

The mode is simply the value

15. Central tendency measures

that appears most often.

16. Central tendency measures - plot

On a density plot, the mode is the peak of the distribution.

17. Central tendency measures - plot

The median splits the area under the density plot into two equal parts.

18. Central tendency measures - plot

And the mean is simply the average value.

19. Skewness

In the case of perfectly symmetrical distribution, all these three central tendency measures are the same. Otherwise, the distribution is skewed.

20. Skewness

We say that the distribution is skewed to the right when data piles up on the left.

21. Skewness

The distribution is skewed to the left when data piles up on the right. Take a look at the relationship between mean and median in skewed distributions. Interviewers may check your knowledge of central tendency measures by asking a question about skewness.

22. Descriptive statistics

Variability is the second feature of data that often appears during interviews. Let's quickly recall what variability is.

23. Variability

Take a look at the two distributions in the graph.

24. Variability

The green distribution is more concentrated around the mean; hence, its variability is lower.

25. Variability

The blue distribution is flatter. The data is more variable.

26. Variability measures

The most common variability measures are the variance and standard deviation. You may come across the range as well, but this measure is less often used.

27. Variability measures

To derive variance, you need to calculate the difference between each observation's value and the overall mean, square them, and take the average.

28. Variance - numerical example

Let's calculate variance by hand on a simple example. There are three values: 2, 5, 11. Firstly, we calculate the mean, which amounts to 6. We distract the mean from each of the values and square the differences. In the next step, we calculate the sum of the squared differences. And divide the sum by the number of values. The variance amounts to 14.

29. Variability measures

The second variability measure is standard deviation. The standard deviation is simply the square root of the variance.

30. Variability measures

And the range is the difference between the maximal and the minimal value.

31. Summary

Let's wrap up. In this video, we've covered central tendency measures: mean, median, and mode. Then we discussed skewness. We have also gone over variability measures, including variance, standard deviation, and range.

32. Let's practice!

Let's practice descriptive statistics before the interview!