Get startedGet started for free

Deriving variables from complex column headers

1. Deriving variables from complex column headers

In the previous lesson, we saw how you can extract variables from column names.

2. Separating column headers into variables

The pivot_longer() function allowed us to do the transformation shown here, in a single function call. But sometimes, column headers can get even more confusing than in the example shown here.

3. Multiple variable combinations in column headers

Let's look at this sample of WHO data. For each country, we have the percentage of females and males that are obese and the life expectancy, also for both females and males. The complexity of these column headers lies in the fact that the sex variable is combined with not one but two other variables in the column headers.

4. Multiple variable combinations in column headers

We can visualize the messy situation like this. And the tidy format we want to get to like so. Fortunately, even this transformation can be done with just a single call to the pivot_longer() function.

5. The special .value name

In fact, the function call is pretty simple. The only special thing is the dot-value name in the vector passed to the names_to argument. The old column names were a combination of the sex followed by an underscore and then either pct-dot-obese or life-dot-exp. Through the dot-values argument, the pivot_longer() function knows that it should create column names from each string it finds in the second part of the input string after separating by an underscore. It separates by an underscore because this was specified in the names_sep argument.

6. pivot_longer() recap

This was the most complex application of the pivot_longer() function you'll see so let's do a quick recap. The function is very powerful and can completely change the format of your data. It takes some time to learn how the different arguments work, but often, the trickiest part is realizing when to use the function. Therefore, whenever your column headers contain values that seem to belong in a separate variable, check if you can use the pivot_longer() function.

7. Uncounting data

We'll finish up this lesson with a function that is not connected to pivot_longer() but does illustrate the wealth of transformations you can do with tidyr. The data sample here shows the number of nuclear bombs detonated per country. If we want to reshape our data so that we once again have one row for each bomb that was dropped

8. The uncount() function

we can use tidyr's uncount() function. We specify that we want to use n_bombs as the variable to uncount by and then each country will get n_bomb rows in the output.

9. The uncount() function

We can also pass the uncount() function a constant, two in this case, and then each row in the data will be duplicated that number of times.

10. The uncount() function

If we want to add an identifier for each bomb per country, we can do so with the dot-id argument. We just pass it a string with the name of the id column to add. bomb_id in this case.

11. Let's practice!

Now it's your turn, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.