Faceting word count plots

1. Faceting word count plots

We often want to compare two or more subsets of the data in the same plot. For example, in our review data we have two products. It would be nice to visualize the word counts by product. We already know how to create plots with facets, but there are a few functions we'll need to learn in addition.

2. Counting by product

The tidy_review data has already been tokenized and stop words have been removed. The first thing we want to do is compute the word counts. However, instead of just counting words, we include both word and product in the count() function. For convenience, we also arrange these counts in descending order to see that we are indeed counting by product, with the product column now part of the output of the count() function.

3. Using slice_max()

Before we plot, we need to filter to keep just the most common words for each of the products. We can use a function called slice_max() that is a wrapper for filter() such that the top n number of rows based on some criteria are kept. Here we first use group_by(product) and then slice_max(). The arguments are n, meaning we want the top 10 rows based on the count column n, and 10, meaning we want the top 10 rows. In the output, note the line at the top that explains the data is still grouped by product.

4. Using ungroup()

We should be careful using some functions when our data is grouped, including mutate(). Before we create a factor of words ordered by the counts, we need to use the ungroup() function. Simply enough, note that this removes the effect of group_by(). This output no longer has the line saying it's grouped by the product column.

5. Using fct_reorder()

Now we're ready to mutate() using fct_reorder() as before. We have all of the steps: count() by product, group_by() product, use slice_max(), ungroup(), and mutate() with fct_reorder(). We save this out as word_counts. Note that the product column has been retained throughout this process so we'll be able to facet using it.

6. Using facet_wrap()

Since we have the product column, we can not only create a facet, we can also include some color! We've added a new aesthetic named fill assigned to product. We've also included a new argument for geom_col() named show-dot-legend that we've set to FALSE, meaning we don't want a legend for this particular plot. Finally, we've included facet_wrap(), where we use the tilde with product to say we want to create facets based on product. The scales argument is also set to “free_y,” meaning the y-axis can be different for each plot.

7. Using facet_wrap()

And there we have it! A plot with facets based on product. Note that when words are shared across facets, the order might not be exactly right, but it's close enough to get a sense of how similar the collection of reviews for each product is.

8. Let's practice!

That might seem like a lot of steps, but the output is worth it! Let's practice.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.