Get startedGet started for free

Descriptive statistics

1. Descriptive statistics

In this chapter, we'll go over concepts in exploratory data analysis, including descriptive statistics, working with categorical data, and examining the relationship between two variables. Let's get started!

2. What are descriptive statistics?

Descriptive statistics are basically what they sound like. They help you describe your data with numerical calculations or plots. There are many different types of descriptive statistics, but we'll focus on those that are most common in interviews: measures of centrality and measures of variability.

3. Measures of centrality

To discuss measures of centrality, let's quickly review mean, median, and mode before we dive into some more complicated problem cases.

4. Measures of centrality

The mean is simply the average, which is the sum divided by the number of observations. The mode is the most common observation, or the peak of the distribution. Finally, the median is the middle value when all the observations are sorted. If the distribution is perfectly normal, then all of these values will be the same. However, if the distribution is skewed, as seen here, then these values will differ. Interviewers will use these skewed scenarios to assess your comfortability with centrality metrics.

5. Measures of variability

When it comes to measures of variability, you should know all about variance and standard deviation, which are used to describe how spread out your data is. The range is simply the max minus the min, and isn't referenced much in interviews, so we'll move past it for now.

6. Measures of variability

The variance is computed by finding the difference between every data point and the mean, squaring them, summing them up, and then taking the average of those numbers. The standard deviation is just the square root of the variance. These formulas are fair game for interviewers, so make sure you're really comfortable with them. Think about your answer to something as simple as the question: what is a standard deviation?

7. Modality

Along with the centrality and variability metrics, there are a few other things worth noting. The first is modality. The modality of a distribution is determined by the number of peaks it contains. Most distributions have only one peak but it's possible to encounter distributions with two or more peaks, as shown in the picture here under bimodal.

8. Skewness

Another important concept is skewness, which is a measurement of the symmetry of the distribution. If we take another look at the distribution from our centrality metrics, we see that it's asymmetrical - with more data on the right than on the left. This distribution is skewed left, whereas the opposite can be true as well.

9. Summary

To summarize, we talked a little about descriptive statistics and what they're used for, along with a more in-depth look at measures of centrality and measures of variability. We also brushed up on modality and skewness. We'll look further at some of the metrics as a whole, but it's also important to think about things in terms of testing and other practical situations. For instance, how would you run an A/B test if your results were right-skewed?

10. Let's prepare for the interview!

Let's keep going and practice on a few interview questions that I had experienced firsthand!