Get startedGet started for free

The inner_join verb

1. The inner_join verb

In other DataCamp courses, you may have learned to use the powerful dplyr package to explore and transform a dataset. But the information you'll need for an analysis isn't always confined to one table.

2. LEGO dataset

In this course, you'll master the important skill of joining multiple tables together, so that they can be analyzed in combination. You'll work with a fun dataset about the construction toys known as LEGOs. The data comes from the Rebrickable website, and has tons of fun information about the sets, parts, themes, and colors that make up LEGO history. The dataset is fascinating, but it's spread across many tables. You'll run into this kind of data a lot if you work in data science.

3. The sets table

In this chapter you'll be focusing on one dplyr verb, inner join, and you'll start by working with the sets table in the LEGO data. This table contains one row for each of the 4,977 LEGO sets, starting with sets like "Medium Gift Set" back in 1949. Notice that there is a column that's not useful on its own: theme-underscore-id. That's because the useful information- the theme name- is in a separate table, called themes.

4. Linking two tables

The theme id variable in the sets table links to the id variable in the themes table. For any individual set, we could find a theme that matches it.

5. Inner join

To see the theme that each set is associated with, we'll need to join the two tables. To do this, you use inner join. This joins the first table "sets" to the second table "themes". Notice that the argument by equals theme-underscore-id equals id. That tells inner join how to match the tables: linking theme id in the first table to id in the second table. Notice that in the output, you've combined the two tables, combining each set with its theme. But because both tables had a variable called name, you end up with name-dot-x, with the set's name, and name-dot-y, with the theme's name, because you can not have two variables with the same name. Inner join lets you customize this to be more readable!

6. Customizing your join

Add another argument, suffix equals c underscore set, comma, underscore theme. This appends underscore set or underscore theme to the shared columns, which gets the much more readable name underscore set and name underscore theme.

7. Most common themes

Now we can answer interesting questions about the data. For instance, we could find out what the most common themes are in LEGO history by piping again to count name underscore theme, with sort equals TRUE. This pattern of taking two tables, finding a link between them, and joining them together is very common, and will enable you to make a lot of interesting discoveries throughout this course.

8. Other LEGO tables

For starters: in the exercises you'll learn about two new tables from the LEGO dataset, parts and part-underscore-categories, and then practice joining them together.

9. Part

A part is a shape, like a gear, a 2 by 4 brick, or a figurine, and they'll come up a lot in this course.

10. Let's practice!

So, let's practice!