Get startedGet started for free

Incorporating joins into your data.table workflow

1. Incorporating joins into your data.table workflow

The real power of using the data table syntax for performing joins is that it allows you to incorporate joins into your other data table workflows. This enables you to perform rapid data analysis when your data is spread across multiple data tables.

2. Chaining data.table expressions

The most flexible way of incorporating joins into your data table workflows is by chaining data table expressions. In the first expression, you perform your join operation. Then the second data table expression allows you to work on the result with any of the other data table expression functionality you've learned in the previous course. The general form on the slide can be read as "Join DT1 to DT2 using the column in on as the join key. Then in the result, filter by rows in i, compute on columns j grouped by values in columns by."

3. Join then compute

Here's an example. We have two data tables, one filled with information about the business's customers,

4. Join then compute

and a second with details of the purchase history.

5. Join then compute

To get the average spent per sale for return customers grouped by gender, we chain two data table expressions. In the first, we join customers to purchases, then in the second, we filter to customers with more than 1 sale, calculate the average spent per sale in j, grouped by the gender column in by.

6. Computation with joins

You can also use the j argument to compute on columns in the same data table expression as a join. This is a memory efficient way to perform calculations on the result of a join because the data table expression will create only the join result for the columns used in j in memory. This can save a lot of time and memory for large data tables!

7. Joining and column creation

Column creation can also be performed in the same data table expression as a join. The new column will be added to the main data table on the left side of the join. In the example on the slide, after joining purchases to customers, we are adding a new column, return_customers to the customers data table, which is computed based on the sales column from the purchases data table in the result of the join.

8. Grouping by matches

The by argument gains a special symbol, .EACHI, when used in a join expression. This lets you group computation in j by each row in the data table on the right side of the join. This can be read as "Join DT1 to DT2, then for each row in DT2 group calculations in j by the matching rows in DT1."

9. Grouping by matches

In this example, we used by = .EACHI to count the number of shipping addresses associated with each person in the customers data table.

10. Grouping by columns with joins

The by argument also works slightly differently than normal when used in a join expression. It can only be provided columns from the main data table on the left side of the join, and you can only perform computation on columns from the main data table.

11. Grouping by columns with joins

For example, you could left join shipping to customers to calculate the average age, grouped by gender, of customers with shipping addresses, because both the age and gender columns are in the customers data table.

12. Let's practice!

Now its time for you to join data tables while doing some computations!