1. Grouped aggregations
Suppose you want to filter batrips so that it contains data for only those zip codes with greater than 1000 trips. How would you do that?
2. Combining ":=" with by
In the previous video, you saw how to add or update columns by reference to the original data table using the new ":=" operator. You will now add and update columns for different groups in your data table using the "by" argument.
As you can see here, a new column "n_zip_code" is added to batrips which contains the total trips made for each "zip_code". When you update a data table by reference, it is updated silently. Nothing is printed to the console. Checking the number of columns before and after you can see that the number of columns increased by 1.
To view the results as soon you update a data table by reference, you can simply chain a pair of empty square brackets, which will print the updated data table.
3. Combining ":=" with by
Now all you need to do is use the "n_zip_code" column to filter batrips. Of course, you don't really need the column "n_zip_code" in your final result,
4. Combining ":=" with by
so you can delete it by reference by chaining one more expression as shown here.
This is a very common pattern in data analysis. You often need to add intermediate columns to get to the final result, but don't necessarily need them in the final result.
5. Let's practice!
Go ahead and practice aggregating data by reference!