1. Casting data.tables
In this lesson, you will learn how to cast data tables from long to wide formats.
2. Casting a long data.table
In the previous lesson, you learned how to melt a wide format data table, shown on the right, into a long format data table, shown on the left.
The dcast() function allows you to make the reverse transformation, reshaping a long format data table into a wide one.
3. The dcast() function
The dcast() function takes three arguments. The first argument is the data table you want to reshape. The third argument, value dot var takes the name of the column whose values you want to split into the multiple columns in the wide format result. The second argument is a formula describing how to split this column, and which columns to keep aside.
It is composed of three parts: a left-hand side, a tilde - which is the squiggly symbol you see, and a right-hand side. The right-hand side takes the name of a column of group labels, each unique label will become a new column in the wide format containing the values of the column given to value dot var for the rows with that group label. The left-hand side takes the name of the column containing the identifier labels for each observation that are repeated in each group.
4. The dcast() function
Returning to our example you can see the corresponding call to the dcast() function.
Here, the "amount" column is split into two columns, based on the unique labels in the "year" column, which is given to the right-hand side of the dcast() formula. The quarter column contains repeated identifiers, the values 1 to 4 repeated by each unique year label. This column is given to the left-hand side of the dcast() formula and is kept aside as row identifiers in the result.
5. Splitting multiple value columns
You can split multiple columns at once by supplying a character vector of column names to the value dot var argument. The columns in the result will be named after the value column followed by an underscore, then each group label.
6. Multiple row identifiers
You can keep multiple columns aside as the row identifiers by separating their names with a plus sign in the left hand side of the formula.
7. Dropping columns
Note that if you don't provide a column to the formula or the value dot var argument it won't be included in the result.
8. Multiple groupings
You can also split on multiple group label columns using a plus sign in the right hand side of the formula.
9. Converting to a matrix
One of the reasons you might want to cast a long data table to a wide one is if you want to create a matrix of values for another function to use.
10. Converting to a matrix
You can do this with the as dot matrix function. When called on a data table, you can use an additional argument, "rownames", to tell the function which column to use as the matrix row names.
11. Let's practice!
Now it's your turn to see how dcast() works in practice.