1. Problem columns
In the last lesson you learned how to work with complex join keys, and how to diagnose and fix problems arising from misspecified joins.
In this lesson you will learn how to diagnose common problems you may encounter with data table columns when joining.
2. Common column names
When joining two data tables you may sometimes find that they share column names which are not the join keys.
An example is shown on the slide. Both data tables have "name", "gender", and "age" columns, and you can see that to join these two data tables, you would need to match the name and parent columns for the join keys.
Common column names like this pose two challenges: they make it harder to correctly identify the join keys and they make it harder to interpret the result of the join.
3. Common column names
So what happens to these columns in a join?
With the data table syntax, columns from the data table passed to the i argument will have "i dot" added to the start of their column names.
4. Common column names with merge()
When using the merge() function, these common column names will instead have "dot x" or "dot y" added to the end, denoting which of the input data tables they came from.
5. Adding context with your own suffixes
The suffixes argument to the merge() function allows you to use different suffixes than just "dot x" and "dot y". This is a useful way of providing helpful context to the result of the join.
6. Renaming columns
Another solution is to rename these columns, either before or after the join. This can be done with the setnames() function.
You can either give it a single character vector to give new names to all the columns in a data table,
or two character vectors to rename a subset of columns.
7. Joining with `data.frames`
Another problem you may run into is mixing of data tables and data frames in your R session.
The merge() function also works for data frames, and can also be used to join a data frame with a data table.
Sometimes, data frames store the unique identifying information about their rows in the rownames. You will need to extract these if you want to use them as the join keys or want to keep that information in the result of the join.
The simplest way to do this is to convert it to a data table using the as dot data dot table function along with the keep dot rownames argument to convert this into a column with the column name of your choice.
8. Let's practice!
It's time to join data tables with tricky columns.