1. Descriptive statistics
Welcome to the second chapter of the course!
2. Descriptive statistics
In the last chapter, we covered probability distributions.
Now, we will brush up our exploratory data analysis skills.
Let's start with descriptive statistics.
3. Descriptive statistics
Interviewers may be eager to check if you can quickly familiarize yourself with a new dataset at hand.
Descriptive statistics help to understand the main features of data.
4. Descriptive statistics
During interviews, two main types of statistical measures pop up: central tendency measures and variability measures.
5. Descriptive statistics
Let's start by reviewing central tendency measures.
6. Central tendency measures
The most common central tendency measures are mean and median.
You may encounter mode as well.
In the example, we have a sample of 6 values arranged from the lowest to the highest.
7. Central tendency measures
The mean is the sum of the values divided by the number of values.
8. Central tendency measures
The median is the value separating the upper half from the lower half of a sample. In sets of data with an odd number, the median is the middle value.
9. Central tendency measures
If you have 5 values and put them in order,
10. Central tendency measures
the median will be the third number
11. Central tendency measures
since two numbers are lower than it
12. Central tendency measures
and two numbers are higher than it.
13. Central tendency measures
If there is an even number of observations in a data set, then there is no single middle value. The median is the mean of the two middle values.
14. Central tendency measures
The mode is simply the value
15. Central tendency measures
that appears most often.
16. Central tendency measures - plot
On a density plot, the mode is the peak of the distribution.
17. Central tendency measures - plot
The median splits the area under the density plot into two equal parts.
18. Central tendency measures - plot
And the mean is simply the average value.
19. Skewness
In the case of perfectly symmetrical distribution, all these three central tendency measures are the same. Otherwise, the distribution is skewed.
20. Skewness
We say that the distribution is skewed to the right when data piles up on the left.
21. Skewness
The distribution is skewed to the left when data piles up on the right.
Take a look at the relationship between mean and median in skewed distributions. Interviewers may check your knowledge of central tendency measures by asking a question about skewness.
22. Descriptive statistics
Variability is the second feature of data that often appears during interviews.
Let's quickly recall what variability is.
23. Variability
Take a look at the two distributions in the graph.
24. Variability
The green distribution is more concentrated around the mean; hence, its variability is lower.
25. Variability
The blue distribution is flatter. The data is more variable.
26. Variability measures
The most common variability measures are the variance and standard deviation.
You may come across the range as well, but this measure is less often used.
27. Variability measures
To derive variance, you need to calculate the difference between each observation's value and the overall mean, square them, and take the average.
28. Variance - numerical example
Let's calculate variance by hand on a simple example.
There are three values: 2, 5, 11.
Firstly, we calculate the mean, which amounts to 6.
We distract the mean from each of the values and square the differences.
In the next step, we calculate the sum of the squared differences.
And divide the sum by the number of values.
The variance amounts to 14.
29. Variability measures
The second variability measure is standard deviation. The standard deviation is simply the square root of the variance.
30. Variability measures
And the range is the difference between the maximal and the minimal value.
31. Summary
Let's wrap up. In this video, we've covered central tendency measures: mean, median, and mode. Then we discussed skewness. We have also gone over variability measures, including variance, standard deviation, and range.
32. Let's practice!
Let's practice descriptive statistics before the interview!