Get startedGet started for free

The merge function

1. The merge function

Now that you can successfully identify join keys, it's time to learn how to perform joins using the merge() function.

2. Joins

In this chapter, you will learn four different joins. The inner join, the full join, the left join, and the right join. Each of these joins give a different result, based on what observations are present in one data table but not in the other. All four are standard joins that originally come from database query languages, such as SQL. So the concepts and skills you will learn in this chapter are widely applicable, not just for the data table package in R. In this chapter, you will learn how to do these joins using the merge() function. This function comes from base R but is extended in the data table package to work efficiently with data tables.

3. Inner join

An inner join combines the columns of two data tables, keeping only the observations present in both data tables, that is, rows whose value in the join key column can be found in both data tables. Returning to our example customer database, an inner join of the demographics data table, shown in blue, to the shipping addresses data table, shown in orange, creates a new data table containing all the columns from both data tables, containing just the people with entries in both; those highlighted in pink, yellow, and green. An inner join is the default behavior of the merge() function. It takes two data tables as inputs, one to its x and one to its y arguments, along with the name of the key column in each data table to the by.x and by.y arguments respectively.

4. The by argument

When the key columns have the same name in both data tables, you can use the by argument instead to avoid typing the column name twice.

5. Full join

If you want to keep all observations that are present in either data table you can supply an additional argument to the merge() function; setting the all argument to be equal to TRUE. This is known as a full join or a full outer join. Observations which were present in only one data table will have missing values in the columns from the other data table, as shown in the white cells in the result.

6. Let's practice!

In the next lesson, you will learn about left joins and right joins. But now, it's time for you to practice inner joins and full joins.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.