Get startedGet started for free

From nested values to observations

1. From nested values to observations

In the previous lesson, we saw how to unnest list columns horizontally with the unnest_wider() function.

2. The unnest_wider() function recap

Using this function on the character column in the Star Wars dataset, we ended up with two columns, name and films. Films was once again a list column, but if we use the unnest_wider() function on it,

3. The unnest_wider() function recap

we ended up with an untidy data frame. Films is a variable, and should thus remain in a single column instead of being spread out over several.

4. The unnest_longer() function

This is exactly what the unnest_longer() function will do for you. We pass it the films variable and the result is a now completely unnested, tidy, data frame. Rectangling deeply nested data usually comes down to a series of unnest_longer() and unnest_wider() operations, the order of which depends on the data structure.

5. Rectangling deeply nested data

For example, we could summarize the content of this course in a nested format where the metadata on each chapter is a named list.

6. Rectangling deeply nested data

This metadata list can be unnested over several columns with the unnest_wider() function. After this first unnesting operation we see that the rightmost column, lessons, is a list column too. However, the lists inside this column are not named lists and their length varies from three to four. These are signals that we should try using the unnest_longer() function to spread the lesson lists over multiple observations.

7. Combining unnest_wider() and unnest_longer()

When we do so, we get 14 observations, one for each lesson in this course. The lessons column is still there as it turned out to be a list of named lists. If you are a bit confused at this point, don't worry. Lists columns tend to do that to you. Since the unnested lessons column now contains named lists of equal length, we try the unnest_wider() function on it.

8. Digging deeper

For each lesson, we now have an id, a title, and a list of the exercises it contains.

9. And deeper ...

We can keep digging deeper by unnesting the exercises over multiple observations with unnest_longer(). Note that we used dplyr's select() function to sub-selected just three columns to keep an overview. We now have 41 observations, one for each exercise in this course.

10. And deeper ...

Our final unnesting operation is unnest_wider() on the exercises named list, which contained an id for each exercise and its completion status.

11. Course status update

After all this hard work, we can use this data to calculate your progress in this course. Since TRUE or FALSE values in the complete column are really just one and zero values. We can calculate the percentage of the exercises you've completed by taking the mean. You're at 78%, almost there!

12. Let's practice!

Now it's your turn, let's practice!