Get startedGet started for free

Visualizing categorical data

1. Visualizing categorical data

Welcome to the final lesson of the chapter! We'll be discussing categorical data and how to visualize it. We will cover sorting and padding bar plots, changing the orientation of x-axis labels, and building a grouped bar plot.

2. Categorical data

Let's quickly recap what categorical data is. Categorical data is any data with a fixed number of options or labels. Examples include gender or country of birth. Factors are another term for labels of categorical data, and this is how they are described in Bokeh documentation. Discussing categorical data in detail is outside the scope of this course, but it is important to have a basic understanding when considering how to visualize data by category. We have worked with categorical data already, such as position, team, and conference in our NBA dataset.

3. Sorting

Bar plots are a common way to visualize categorical data. But what if we want to display the bars from highest to lowest? We group by position, then call the pandas sort_values method on our grouped DataFrame. We pass the column we want to sort by and set the ascending argument equal to False, as we want to display the bars in descending order. The updated plot allows us to compare the average points across positions more easily.

4. Padding

Now our visualization is sorted, we can improve it by adding a gap between each bar. To do this, we include the width argument in the figure-dot-vbar method, passing a decimal to represent the fraction of space our bars should take (out of a maximum of one). This drastically improves how our plot looks.

5. Orientation

Often category labels can be quite long, such as basketball positions written in full rather than abbreviated. This causes the x-axis labels to overlap. In this case, we use figure's xaxis-dot-major label orientation to change the orientation of x-axis labels. Setting to forty-five equates to around a forty-five-degree rotation from horizontal.

6. Rotated x-axis labels

Now we can include longer category names on our x-axis!

7. Nested categories

Suppose we want to create a bar plot where each glyph represents average points per basketball position, subset by the East and West conferences. In this instance, we can create a grouped bar plot using multiple levels of categories, known as nested categories. To do this, we need to store the factors variable, which is a list of tuples containing all combinations of our desired categories. Here we can see nested categories created based on conference and position values from the NBA dataset.

8. Building a grouped bar plot

We can build a grouped bar plot showing points performance, factoring a player's position and which conference they play in by using our factors variable. We import FactorRange from bokeh-dot-models. When creating our figure, we set x_range with a call of FactorRange. We pass our factors variable inside, but for the call to function correctly, we must include an asterisk directly before. We call fig-dot-vbar, with x equal to factors and top equal to the points column.

9. Grouped bar plot

Our output has two x-axis labels - position at the bottom and conference above that. It looks like Eastern Conference Point Guards score almost 2 points more per game than their Western Conference counterparts on average!

10. Let's practice!

Now let's build our own categorical plots!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.