Programming time vs readability

It is a good idea to make use of familiar functions from base R to reduce programming time without losing readability.

The data.table package provides a special built-in variable .SD. It refers to the subset of data for each unique value of the by argument. That is, the number of observations in the output will be equal to the number of unique values in by.

Recall that the by argument allows us to separate a data.table into groups. We can now use the .SD variable to reference each group and apply functions separately. For example, suppose we had a data.table storing information about dogs:

Sex	Weight	Age	Height
M	40	1	12
F	30	4	7
F	80	12	9
M	90	3	14
M	40	6	12

We could then use

dogs[, lapply(.SD, mean), by = Sex]

to produce average weights, ages, and heights for male and female dogs separately:

   Sex   Weight      Age   Height
1:   M 56.66667 3.333333 12.66667
2:   F 55.00000 8.000000  8.00000

A data.table DT has been created for you and is available in the workspace. Type DT in the console to print it out and inspect it.

Get the mean of columns y and z grouped by x by using .SD.
Get the median of columns y and z grouped by x by using .SD.

Data.table novice

Data.table yeoman

Data.table expert

Exercise

Programming time vs readability

Instructions