1. Selecting columns from a data.table
2. General form of data.table syntax (Recap)
Here's the general form of the data table syntax, shown again for convenience. It is read out loud as "Take DT, filter rows in "i", then compute "j" grouped by "by". In this chapter, you will work with the "j" argument which is used for selecting and computing on columns.
data table provides several convenient ways for both selecting and directly computing on columns which makes it easier to write and perform complex calculations with much cleaner code. Additionally, you can also use the data frame syntax for selecting columns from a data table.
3. Using column names to select columns
Similar to data frames, you can pass a character vector of column names as the second argument to data table to select the relevant columns.
4. Using column names to select columns
One thing to note here is the difference in result when selecting a single column from a data frame vs a data table. When you select a single column from a data frame, the result is no longer a data frame, but a vector. Whereas when you select a single column from a data table, the result is still a data table.
This consistency in the output avoids accidental errors in code.
5. Using column numbers to select columns
You can also select columns using column numbers. Here we select the second and fourth columns in batrips.
However, we do not recommend this approach except when having a quick look at the data interactively.
Using column numbers is considered bad practice. A typical project consists of one or more packages with lots of R code that is constantly updated and/or improved, and also typically by multiple contributors. An inadvertent change to the original column structure can lead to incorrect results and bugs.
Hence, you should always use column names wherever possible.
6. Deselecting columns with character vectors
You can use a negative sign prefix with a character vector to exclude a set of columns from your result.
You can also use the not-operator to obtain the same result.
7. Selecting columns the data.table way
So far you have selected columns using the data frame way. One limitation of this approach is that it only allows you to select columns. However, in order to compute on columns and perform advanced data manipulations, you will have to tweak the "j" argument using the data table approach.
Recall how you could filter rows in "i" by referring to columns directly as if they are variables? You can do the same in "j". Since we usually need to select more than one column, "j" accepts a list of columns. For example, you can select "trip_id" and "duration" columns as shown here. Notice that there are no quotes around column names. Also note that since "j" accepts a list, we can also rename columns while selecting, e.g., "duration" column is renamed to "dur".
8. Selecting columns the data.table way
If you want to select only one column, you can choose to return the result as a 1-column data table or a vector. Wrapping the column name within "list()" always returns a data table. If you provide just a single column name in "j", it will return a vector.
9. Selecting columns the data.table way
Since we like to write concise, convenient and clear code in data table, we created dot parentheses as an alias for "list()". They both behave in exactly the same way, but the dot parentheses results in less typing and helps you focus on the actual columns being selected.
10. Let's practice!
Time for you to select columns from a data table!