Get startedGet started for free

Visualizing set differences

1. Visualizing set differences

So far in this chapter, you've been working with the batmobile and batwing tables, which contain the pieces for each of the two LEGO sets. Let's say that instead of working with pieces, you wanted to examine and compare the colors used in each set. This will take a little effort, but it will make a great and intuitive visualization of the differences in the color palettes between the Batmobile and the Batwing.

2. Aggregating sets into colors

Before doing any joining, you'll want to aggregate each set into colors. You've learned how to do this in dplyr with group by and summarize. You can create a total column equal to the sum of the quantity column. We could do the same aggregation for both the batmobile and batwing sets. You now have two tables, one for each set, where each table has one observation per color. The tables have only the color IDs right now, but we'll be joining in the color names later.

3. Comparing color schemes of sets

Earlier in this chapter you learned about full join, and in Chapter 2 you learned about replace na from tidyr. Using these together, you can combine both tables into one table, and replace nas in the total batmobile and total batwing columns. This is the format you'll want for comparing the color schemes of the two sets.

4. Adding the color names

We still have only the color IDs, so we'll want to bring in the color names using an inner join, joining the color id column to the id column. There's still a little more dplyr processing to do before we can have a meaningful comparison of the two sets. First, the two quantities are hard to compare because the two sets have different total numbers of pieces. You'll want to normalize each of the colors, by turning them into fractions of the total.

5. Adding fractions

You've probably learned before that you can add or change columns with the mutate verb. You can turn the columns into fractions by dividing each of the columns by its sum: total batmobile divided by sum total batmobile, and you can do the same to total batwing. Now, instead of looking at the raw number of pieces, you can see that Batmobile is 51-point-6 percent black pieces, while Batwing is only 39-point-7 percent black pieces.

6. The difference between fractions

There's one more step in our comparison within the joined data. What you care about most is the difference between fractions: the Batmobile has more black pieces, and the Batwing has more dark bluish gray. You can add this in as one more step in the mutate: difference equals total batmobile minus total batwing. We'll save this object as colors underscore joined. This has taken a lot of work! But, now that you've processed the joined data, we can easily see which colors are more represented in one set or the other.

7. Visualizing the data

After processing the data, we're ready to visualize it. This isn't a visualization course though, so we won't go over how the visualization works, and we'll provide the visualization code for you in the exercises. But, if you're interested, the code uses a scale fill manual to set up the colors to match the RGB values, and a function from forcats called fct reorder to reorder the columns meaningfully.

8. Visualizing the data

Here's the resulting bar plot. Notice that, thanks to your joining and post-processing, you've got an interpretable visualization of the comparison between the two sets. The bars on the right, like Black and Light Bluish Gray, have positive differences, meaning they're more common in the batmobile set. The colors on the left, like Red and Dark Bluish Gray, are more common in the Batwing. This shows how joining two tables together can fit with other data manipulation and visualization tasks as a part of a larger data science workflow.

9. Comparing Batman and Star Wars themes

In the exercises, you'll use a similar approach to compare two entire LEGO themes to discover the differences in LEGO color schemes between Batman and Star Wars.

10. Let's practice!

So, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.