1. Beginning visualization
Thanks for coming back!
2. Objective
This lesson will look at some basic visualizations to understand when and why to utilize them. Our primary focus will be bar charts, histograms, and scatter plots. There are many more charts. Covering these three will give you a foundation for exploring the wide world of visualizations.
3. Setting the bar
The first two charts, bar charts and histograms look very similar and often confused for one another at a glance. They both utilize bars of different heights to display information about the underlying data. They differ on what kind of underlying data they support.
The underlying data for a bar chart is categorical. Categorical data can be thought of as labels for data points like what type of visitor someone was for a museum (Free, paid, member, premium) or country someone is from. A bar chart will then display bars with heights based on summary information for each label from the underlying data. The most common summary information shows the frequency or count.
4. Museum admissions
Looking at an example dataset of twenty visitors to a museum, we can count each label and then construct a bar chart showing there were ten paid, two free, five members, and three premium visitors.
5. Museum admissions
This visualization is simple but useful for getting a feel of the admission breakdown at a glance.
6. Charting payments
We can also utilize bar charts to display other numerical summaries like this bar chart of the average sale price by payment method. This also tells us some beneficial information we could leverage for further investigations. Even though contactless payments are the highest, how often do people utilize them? Charting information is an excellent way to inspire new avenues of discovery.
7. Behind bars
Moving back to histograms, they are similar to bar charts with one main difference. The data displayed isn't categorical, instead it is numerical. The data is broken up into bins of equal size, and the height of each bin represents the frequency of data points that fall within the bin range.
8. Visualizing sales
If we take eleven sales from the museum gift shop and use a bin width of twenty dollars, we can see that five sales are between 0 and 20 dollars, three sales are between 20 and 40 dollars, and so on for the rest.
This aligns with our histogram. One thing to remember when looking at a histogram is that there aren’t any intentional gaps between the bars like we saw in bar charts. If you see a gap, no underlying data falls in that bin.
9. Connecting the dots
The last visualization we will cover here is the scatter plot. The goal of the scatter plot is to show the relationship between two different numerical variables. We arrange points on a chart by using the value of one variable for the horizontal position and the other for the vertical position. By doing this, we can detect relationships within the dataset more easily.
10. Don't sit so close
Let's look at a sample scatter plot of Television watching vs the scores a student receives on a math test. We can see from the data points that there appears to be a strong relationship between time spent watching tv and exam score, where we see as the total time spent watching TV goes up, exam score goes down. This is the power of scatter plots. Drawing that conclusion from raw data would have taken a lot of effort, requiring us to parse each data point, but with visualization, it's easy to look at the entire dataset all at once.
11. Let's practice!
Visualizing data makes it much easier to understand. With some exercises, let's test your understanding of the three basic visualizations we covered in this lesson.