1. Welcome to the course
Hi, my name is Scott Ritchie. I'll be your instructor for this course on joining data in R with data table. Welcome, and I look forward to seeing you in the course.
2. Joining data.tables
A join describes the action of combining information from two different data tables into a single data table. This is a fundamental skill when working with multiple data sources. The majority of R's functions for analyzing and visualizing data are designed to work on a single data frame or data table. But, you'll often find data you want to analyze is spread across multiple datasets, that may come from different sources.
For example, you might be working with two data tables in your customer database. One containing their demographic information, shown in blue, and another containing their shipping address, shown in orange. The question is: how do you build a single data table containing all the information about each customer? Joins are an efficient and reliable way of solving this type of problem.
3. Course overview
In chapter one of the course, you'll learn how to use the merge function to perform four types of joins that you can find in any data-driven language.
In chapter two you'll learn how to incorporate joins directly into your data table workflows.
In chapter three, you'll learn how to diagnose and avoid common join errors.
Finally, in chapter four you'll learn how to concatenate data tables that have the same columns, and how to transform them between wide and long formats.
4. Table keys
The first skill you need to learn is to identify the join key columns. These are the columns you need in each data table to match the rows between them for a join. No matter what type of join you want to do, you will always need to know which columns to use as join keys.
Returning to the customer database example, to match the rows between the two data tables you would need to use the values stored in the name column, as you can see from their highlighted matching values.
5. Inspecting `data.tables` in your R session
To identify join keys, you will need to learn about the contents of the data tables you are working with.
There are a few different ways you can do this.
The first way is using the tables() function. It will show you all data tables in your R session, along with their number of rows, their number and names of their columns, and how much space they occupy in terms of memory.
It will also tell you any columns you have set as their keys, which you'll learn in the next chapter.
6. Inspecting `data.tables` in your R session
Another way is using the str() function. This is a general purpose function that will show you the type of data stored in any R object, in this case, a data table along with the types and first few entries of each column.
7. Inspecting `data.tables` in your R session
Finally, typing in the variable name and hitting enter in the console will show you the values stored in a data table. If it has more than 100 rows, only the first and last five rows are displayed by default.
8. Let's practice!
Now lets explore some of the data tables you will be using in this course.