1. data.table syntax
Congratulations on completing the first chapter and welcome back. In this lesson, you will perform joins using the data table syntax.
2. Recap of the data.table syntax
The general form of a data table is shown here. It can be read out loud as "Take DT, filter its rows in i, then compute on columns j grouped by values in columns by.
3. Joins
To perform a join using this syntax, you pass another data table to the i argument. This can be read out loud as "Join DT to i using the column in on as the join key".
You can think of it as just any other filter operation in i, except here you're filtering the rows in DT based on those that have matches to rows in i.
4. Right joins
This naturally leads to the definition of a right join. Like any other subset operation in R, values that have no match return missing values. So here, rows from the data table you are subsetting on in the i argument are kept in the result, with missing values where there was no match in the main data table.
5. The on argument
Join key columns are supplied to the on argument. This follows similar rules to the j and by arguments in the data table syntax. If you use the list function or its alias you can enter the join key column name as if it was a variable.
You can also use a character vector to specify the join key. This also allows you to store the join key in a variable that you pass to the on argument.
Note that here we haven't used the list function or its alias, so the data table syntax looks for the join underscore key variable, not for a column with the same name.
6. Left joins
To perform a left join you can swap the order of the data tables, so that the data table on the left side of the join is passed to the i argument.
To left join the demographics data table shown in blue to the shipping data table shown in orange we swap the order of data tables from the previous slide and perform a right join on the demographics data table.
7. Inner joins
To perform an inner join you supply an additional argument, nomatch equals zero. This tells the data table syntax to ignore rows that cannot be matched between the two data tables, giving you an inner join.
8. Full joins
It is not possible to perform a full join with the data table syntax, so you will need to rely on the merge function you learned in chapter 1. Remember to set all equal to TRUE to perform a full join!
9. Anti-joins
Finally, another type of join you can perform using the data table syntax is an anti-join. This filters the main data table on the left so that it contains only rows that have no match in the data table on the right.
Anti joins are performed by adding an exclamation mark, known as the "not" operator, before the data table you want to filter on in the i argument.
10. Let's practice!
Later on in this chapter you will learn how to use the other functionality of the data table syntax while joining data tables. For now, its your turn to try out joins using the data table syntax.