Get startedGet started for free

The left_join verb

1. The left_join verb

What's the difference between the Batmobile and the Batwing?

2. Batmobile vs. Batwing

Well, they're both vehicles owned by the superhero Batman, though the Batmobile is a car and the Batwing can fly. We might not be able to tell which one would win in a fight, but, with the datasets we're using, we can look at what parts are included in one of their LEGO sets but not the other.

3. Recall: inner join

Let's start with an inner join combining the inventory parts and inventories tables, where you saw in Chapter 1 that this is how you combine sets and inventory parts. For simplicity, we're also using select to remove the id and version variables, since we don't need them, and arrange to sort in descending order of quantity, which represents how many of a piece appear in each set.

4. Filter for LEGO sets

Now, you can use the filter verb to extract just two LEGO sets- the Batmobile and the Batwing- based on their set numbers. We're giving you the set numbers here, and we'll be doing something similar for you in each of the exercises.

5. Comparing tables

This results in two tables, each containing three variables, that look like this.

6. Parts and pieces

Each observation is a LEGO piece. Recall that a piece is a combination of a part and a color, like a gray wheel or a red 2 by 4 brick. For now, we're leaving out the part and color names: you've learned how you can join them in when you want them.

7. Joining with multiple columns

As you learned in Chapter 1, you can find out what pieces are in common between the Batmobile and the Batwing using an inner join. But, unlike the joins in Chapter 1, there are two columns you need to join on. Each of these observations isn't just a part, but a combination of a part and a color. Notice, you can specify this with by equals c quote part-underscore-num unquote comma quote color-underscore-id unquote. That specifies we want to join on both columns. Notice, we also added the suffixes batmobile and batwing, so that we can tell the quantity columns apart. This shows that the first part appears 28 times in the Batmobile set, but only 17 times in the Batwing set.

8. The left join

But what if we wanted to keep parts that were in the Batmobile but not the Batwing? An inner join keeps only observations that appear in both tables. But if you want to keep all the observations in one of the tables, you can use a different dplyr verb: left join. Notice that the quantity batwing column now has NAs in it, representing missing values. For instance, the first piece, with part num 3023 and color id 72, appears 62 times in the Batmobile but never appears in the Batwing.

9. Join review

We didn't have that piece in the inner join: since it wasn't in the second table, it didn't appear in the join. That's what left join means: you keep all the observations in the first, or "left", of the two tables, whether or not it occurs in the second, or "right", table. With a few more joins to the colors and parts tables, you could add the details on each part and color. You'll practice doing this in the exercises.

10. Let's practice!

Let's get to it!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.