Get startedGet started for free

Selecting nested variables

1. Selecting nested variables

So far, we've mainly focused on two functions to unnest your data, unnest_wider() and unnest_longer().

2. Unnesting list columns completely

To access the deeper layers of a nested list, we kept calling one of these two functions in sequence. However, you may not need the data in all the columns you are unnesting, maybe you want to select just a few variables from different layers.

3. Selective unnesting with hoist()

Let's take a step back and look at the structure of this solar system data. At the top level, we have a list column named moons, it contains eight observations, one for each planet. At the next level, we are looking at the moons of Jupiter, this is an unnamed list with 67 elements. For each of these elements, we have a named list with two elements, moon_name and moon_data, which is again a named list column with elements radius and density. We can dig into this structure with the hoist() function. Its first argument is the list column we're going to extract elements from, moons in this case. The other arguments, first_moon and radius, are the names of two variables that we'll select from different depths of the nested structure. This sub-selection is done with a list, the first element of both lists is a value of one, this is because the list of 67 moons is not a named list, therefore, we can only access its elements by index. By specifying one, we select the first moon of each planet. The deeper level lists are named and so there we can use these names to dig deeper. The result looks like this. We still have the moons list column in there too but the elements that we selected are no longer in it.

4. Selective unnesting with hoist()

If we want to select the name and radius of all moons we still need to use the unnest_longer() function first to get observations for each moon. We can then use hoist() on this moons column like so. Note that we no longer use the index one when selecting the elements since the first layer of the data was unnested by unnest_longer().

5. Unnesting Google Maps data

Let's look at a deeply-nested real-world example with JSON data from the Google Maps API. This example is inspired by the official tidyr docs. At the top level we have a data frame with city names which we requested location data for, and the JSON response that we got from the API.

6. Unnesting Google maps data

Using our old approach, we can unnest this JSON data a first time to find another list column, results, with just one element per list, and the status of the request we made, which is OK for all.

7. Unnesting Google maps data

We can keep unnesting this data but as you can see the number of columns to keep track of keeps increasing too.

8. Unnesting Google maps data

Eventually we might just be interested in the coordinates of each city, if we unnest the geometry and location list columns we can finally access that data. It takes us five unnesting operations and one select() call to get this result.

9. Selecting Google maps data with hoist()

Alternatively, we could use hoist() to dig into this deeply nested structure directly. We look for elements in the json list column, then select the results list, within this list we select the first element since there only was one, then we select the geometry list, the location list, and finally either the latitude or longitude variables. Accessing unnested data this way is more efficient computationally and has a shorter syntax, but it does require you to know the structure of your data perfectly.

10. Let's practice!

Now it's your turn, let's practice!