List columns
1. List columns
In the last part of this chapter, we are going to have a look at a specific data structure: list-columns.2. What is a list column?
The idea behind a dataframe with a list column also called a nested dataframe, is that you can put any element in one "cell" of a dataframe, instead of a scalar value. As you can see in the slide, the column called "list" contains a list of three elements, each with a specific length. In the output, the list-column is composed of cells with elements of different length. Note that this is something that you cannot do with classic dataframes. This behavior is specific to the tibble class, which is the tidyverse implementation of dataframes.3. Why list columns?
Again, let's ask ourselves this question: why list-columns? First of all, you can write cleaner code with list-columns as you will be able to perform everything inside the same pipe, as everything will stay inside the same dataframe. This format allows you to combine the power of dplyr with the flexibility of purrr. For example, let's say we have a function that returns a non-predictable number of results. A good example would be the URL extractor we created in the exercises: you can't predict the number of links that will be extracted from a given URL. If we have a column of URLs inside a dataframe, we can combine mutate() and map() to create a new list-column that will contain the results.4. Unnesting nested data.frame
The idea, once you've performed your operations, is to get back to a standard dataframe structure. To do that, you'll need to call the unnest() function from the tidyr package. As you can see, the output is a new dataframe, with one observation per cell.5. nest() a standard data.frame
We can also create a nested dataframe from a standard dataframe with the combination of group_by() from dplyr and nest() from tidyr. What we are doing in the slide is grouping by Species and nesting. As you can see, I now have a Species column, with one row by species, and a data column, which contains, in each cell, a subset of the original dataframe by group.6. A new list to map on
As this new column is a list of elements, you can use purrr functions to modify, compute, summarize and pretty much everything you can think of. You can, for example, launch the lm() function, and it will run on each group. That way, everything that happens in a dataframe stays in the dataframe.7. nest() and unnest()
On the slide, you can see an example of how combining dplyr, tidyr, and purrr to extract the r.squared of the linear model, and grouping the iris dataset by species. In other words, we are grouping, nesting, performing the computation, and unnesting: we are writing cleaner code here as we are keeping the whole data manipulation inside the same pipeline, so it's easier to understand what is going on.8. Let's practice!
Now it's your turn to try list-columns in the exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.