Get startedGet started for free

Chaining data.table expressions

1. Chaining data.table expressions

In this lesson, we will see another powerful feature of data table, chaining expressions together.

2. Chaining expressions

What does chaining expressions mean? What it means is that instead of assigning the output data table to an intermediate object and then performing some operation on it, you can successively perform operations on the outputs. Let's look at this example where we find the three shortest trips that are over 1 hour, that is, 3600 seconds. We first filter batrips such that duration is greater than 3600 and then arrange the resulting data table by ordering the duration column and finally subset the first three rows.

3. Chaining expressions

Let's move on to a more advanced example. Suppose you want to find the top three start stations which have the lowest mean duration. This task requires several steps. First you would have to compute the mean duration of all start stations and store the result in a variable. Then, you would need to order() the mean duration column in increasing order and choose the first three rows from that result. Instead, you can do all this in a single step without using temporary variables by simply chaining expressions together as shown here. Chaining essentially comes for free. You can also chain data frame operations together. However the number of things you can do with a data frame is so little that there is no real use in chaining data frame operations.

4. uniqueN()

Let's move on to another useful helper function, uniqueN(), which is particularly helpful when used with "by". The unique() function from base R returns all the unique values of the input object. uniqueN() from data table simply returns the count of unique elements of the input object. It works on vectors, data frames and data tables all alike. The vector id has two unique values so the result of uniqueN() is 2. When you pass a data table to uniqueN(), it looks at all the columns by default. Since there are no duplicate rows, the result is 4, which is the same as the total number of rows in x. You can also use the "by" argument in uniqueN() to search for the number of unique values in a specific column of a data table. Again, as the id column consists of only 2 unique values, the result is 2.

5. uniqueN() together with by

uniqueN() can be a very handy function, particularly when used with the "by" argument of data table. Let's say you'd like to compute the number of unique bike ids for each month. You can do this by first grouping batrips by month. Remember you can use the month() function on the start_date column in the "by" argument to do this. Then you can use the uniqueN() function on bike_id in the "j" argument thus calculating the number of unique bike ids for each month.

6. Let's practice!

Now let's try some examples.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.