Session Ready
Exercise

Programming time vs readability

It is a good idea to make use of familiar functions from base R to reduce programming time without losing readability.

The data.table package provides a special built-in variable .SD. It refers to the subset of data for each unique value of the by argument. That is, the number of observations in the output will be equal to the number of unique values in by.

Recall that the by argument allows us to separate a data.table into groups. We can now use the .SD variable to reference each group and apply functions separately. For example, suppose we had a data.table storing information about dogs:

Sex Weight Age Height
M 40 1 12
F 30 4 7
F 80 12 9
M 90 3 14
M 40 6 12

We could then use

dogs[, lapply(.SD, mean), by = Sex]

to produce average weights, ages, and heights for male and female dogs separately:

   Sex   Weight      Age   Height
1:   M 56.66667 3.333333 12.66667
2:   F 55.00000 8.000000  8.00000

A data.table DT has been created for you and is available in the workspace. Type DT in the console to print it out and inspect it.

Instructions
100 XP
  • Get the mean of columns y and z grouped by x by using .SD.
  • Get the median of columns y and z grouped by x by using .SD.