1. Standardizing data
In this lesson, you'll learn how to standardize your data.
2. Why standardize?
Why do this? Many real-world datasets you'll encounter will often have variables that are measured on different scales. For example, height might be measured in feet, while weight might be measured in pounds. This poses a problem, because variables on different scales are harder to compare, and it may lead you to misinterpret the importance of a particular column - that column may appear more important simply because it has larger values than another, when in reality, it may actually have a very similar distribution to the column with smaller values.
The solution to this problem is to standardize your data so that all your variables are on the same scale. In statistics, standardization centers a dataset's distribution around the mean of the data and calculates the number of standard deviations away from the mean each point is.
You can standardize your data by calculating z-scores, also known as standard scores. Z-scores are an extension of what you've already seen in this chapter.
To calculate the z-score of a data point, subtract the mean and divide by the standard deviation,
3. Calculating Z-scores: Part 1
as shown on this simple example here, in which we have 3 data points.
4. Calculating Z-scores: Part 2
First, we need to calculate the mean, using the AVERAGE formula,
5. Calculating Z-scores: Part 3
and then the standard deviation, using STDEVP.
6. Calculating Z-scores: Part 4
Let's add this information into 2 new columns.
7. Calculating Z-scores: Part 5
To calculate the z-score, we then need to subtract the mean from each data point, and divide by the standard deviation.
8. Calculating Z-scores: Part 6
But you probably don't want to calculate this manually,
9. Calculating Z-scores: Part 7
as we're doing here.
Just as with the standard deviation, variance, mean, median, and other statistics you've seen so far, there's a spreadsheets formula that makes it easy to calculate z-scores.
10. Calculating Z-scores in Google Sheets: Part 1
In the STANDARDIZE function,
11. Calculating Z-scores in Google Sheets: Part 2
you need to pass in the data point,
12. Calculating Z-scores in Google Sheets: Part 3
the mean, and the standard deviation, as shown here.
13. Calculating Z-scores in Google Sheets: Part 4
and the standard deviation,
14. Calculating Z-scores in Google Sheets: Part 5
as shown here.
15. Comparing
Let's say we had another set of data points that are 10 times larger.
16. Comparing: Part 2
As you can see here, while the standard deviation and mean are different - 10 times larger -
17. Comparing: Part 3
the z-scores are exactly the same! Despite being 10 times larger, the distances of each data point to their respective sample's mean & standard deviation are the same as in the first column, and this allows you to easily compare the two columns. In the exercises, you'll have the opportunity to practice standardizing your data.
18. Almost there, let's standardize!
Almost done with chapter 1. You're a stats rockstar already!