Get Started

Concatenating two or more data.tables

1. Concatenating data.tables

Welcome back! In this lesson you will learn how to work with datasets whose rows are spread across multiple data tables by concatenating them into a single data table.

2. Same columns, different data.tables

Sometimes datasets you want to work with are split across multiple data tables, usually because they've been read in from multiple files. You may want to concatenate these rows into a single data table.

3. Concatenation functions

There are two functions you can use to concatenate data tables. The rbind() function, and the rbindlist() function.

4. The rbind() function

The rbind() function takes any number of data tables as inputs, and concatenates their rows into a single data table.

5. Adding an identifier column

Each of these variables can be given names using the equals operator, and an extra argument "idcol" will tell rbind() to use these names to create an extra column in the result that indicates each row's data table of origin.

6. Adding an identifier column

If you use the idcol argument without naming the variables the data tables will be numbered instead.

7. Adding an identifier column

And if you don't supply a column name to idcol and just set it to true, the column will be named "dot id".

8. Handling missing columns

When the input data tables each have a different number of columns, you will need to set the fill argument to TRUE. This will fill the missing columns with NAs in the result.

9. Handling missing columns

If you don't do this, the code will result in an error that looks like this, telling you that the column numbers are inconsistent.

10. The rbindlist() function

You can use the rbindlist() function if your data tables are stored as elements of a single list. An example of where you might find this useful is when you have several files you want to import into a single data table. Rather than reading each file into separate variables, you could load these into a list by lapply()-ing the fread() function over a vector of file names, and then use the rbindlist() function to concatenate their rows into a single data table.

11. Adding an identifier column

When using the rbindlist() function, the idcol argument uses the names of the list elements when creating the additional identifier column.

12. Handling different column orders

Finally, the use dot names argument in both rbind() and rbindlist() functions matches columns by their names when concatenating data tables.

13. `data.tables` with different column names

Setting use dot names to false allows you to concatenate data tables whose columns have different names.

14. Pitfalls of `use.names = FALSE`

But, you need to be careful because it will mean columns are always concatenated in the order they are found.

15. Differing defaults

The use dot names argument has different defaults in the rbind() and rbindlist() functions. By default it is true in the rbind() function, but in the rbindlist() function its default is FALSE, and changes to TRUE if you set fill equal to TRUE.

16. Let's practice!

Now it's your turn to see how these functions work in practice.