1. Visualizing Distributions
Now we will focus primarily on visualizations. A good read out to your audience often includes compelling visualizations. In this section, we discuss a histogram which communicates the frequency of a single variable.
2. Common distribution
It turns out a lot of the things we measure in the world look something like this. This is the distribution of human weight from a sample. The most frequent value is the highest point with weight values less numerous to either side.
3. Another example
Here's more data representing minutes for pizza delivery. Yes pizza delivery will affect your weight but really these two data distributions aren't related.
In both cases, the highest point is in the middle of the distribution. Put another way, the most frequent value is the highest point. The curve here is symmetrical meaning it flows to either side of the point. The resulting shape is similar to a bell so this distribution is often called a bell curve.
In fact, a bell curve is so common it is technically called the "normal distribution".
4. The Histogram
This is a histogram of a normal distribution. Again notice the peak in the middle and the symmetrical slope to either side. Instead of the smooth curve in the last slide, the rectangular bars are grow as more data is within that value range. For example, the most number of values in this data is 10. Thus the highest bars are on either side of 10. In this chapter you will learn to see the relationsips between the histrogram visual and the summary stats you learned about before.
5. Comparing smoothed Density plot & histogram
Overlaying the smoother curve, with the rectangular bars of a histogram you can easily see it represents the same information. As before, the histogram here can be called "normal" which has a mathematical definition. Conceptually, a normal distribution is symmetrical and shaped like a bell. This means that the data extends equally to the left & right from the most frequent value. In normal distributions, the mean, median and modes have similar values. As a result, not only is the distribution centered at the mean, median and mode but the frequencies of these values decrease symmetrically so other summary stats are affected like quartiles and standard deviations. In a histogram this is represented with smaller and smaller rectangles extending outwards from the peak.
6. Other distributions
There are many distribution shapes. Some of the more common ones are shown here. For each, the summary stats will change. For example the mean is affected by outliers, or extreme values in the data. However, the median is not. So in a "skewed" distribution, these values will differ, unlike the normal or symmetrical distribution where mean and median are very close. As you get more familiar with summary statistics you'll be able to spot a non-normal distribution by reviewing the stats instead of in a histogram.
7. How normal is the data distribution?
Often in statistics you will want to test how "normal" the data is. Even though you can plainly see a bell curve shape , there are two mathematical aspects to a normal distribution. First, how much does the data lean or skew from the most frequent value. In the visual its shown with the red arrows. A normal distribution will have little to no skew, represented by the vertical red line. Second, to measure symmetry and how the tails trail off, you review the blue lines. These aspects, leaning & trailing off, are measured in "skew" and "kurtosis". In sheets use `skew` &`kurt` along with the data range to get leaning and trailing off statistics. There are many opinions on acceptable values for skew & kurtosis to describe a normal distribution. Typically I've seen that values between -2 & 2 for both can indicate a "normal" distribution.
8. Histogram in Sheets: Part 1
Adding a histogram in sheets is straight forward.
9. Histogram in Sheets: Part 2
In the navigation ribbon, go to "insert"
10. Histogram in Sheets: Part 3
then "chart". This brings up the charting dialog.
11. Histogram in Sheets: Part 4
Select histogram from "chart type" drop down.
12. Histogram in Sheets: Part 5
And finally declare your data range
13. Histogram in Sheets: Part 6
for the variable of interest.
14. Let's practice!
In the upcoming exercises, you will calculate summary statistics then see how the visuals change. Plus you'll check for normal distributions using skew & kurtosis.