Get startedGet started for free

From long to wide data

1. From long to wide data

So far in this chapter, we've focused on moving variables within column headers to their own columns. This typically reduces the number of columns and increases the number of rows. Making the data less wide but longer. But what if we want to do the opposite?

2. Variable names in a column

Consider this sample of WHO data. For each country, we have a life expectancy estimate and obesity percentage, but the variable names, life_exp and pct_obese, are stored in a column named metric.

3. Variable names in a column

We can visualize the issue like this, and the tidy format we want to get to like so. Note that this format is more concise since we don't store any duplicate values.

4. The pivot_wider() function

To achieve this result we'll use a function with the opposite effect of the pivot_longer() function, named pivot_wider(). In its most basic form pivot_wider() needs two arguments, names_from and values_from. names_from makes you specify the column from which to derive the new column names, in our dataset this was the metric column. The values_from argument is used to specify where to get the values for these new columns.

5. The pivot_wider() function

If we want, we can further specify a names_prefix argument, and pass it a string to use as a prefix for the newly created column names, "national_" in this case.

6. Transposing a data frame

Sometimes, you're faced with a dataset like this. It's almost in a tidy format, except for the fact that the variables are rows instead of columns and that the observations are columns instead of rows.

7. Transposing a data frame

We can visualize the issue like this, and the tidy format we want to get to, like so. Turning all rows into columns and vice-versa is known as transposing your data.

8. Transposing a data frame: step 1

But to do it using tidyr functions, we'll need two steps. The first is to use the pivot_longer() function to put the variable hidden in the column names in a separate column, year in this case.

9. Transposing a data frame: step 2

Then, we'll use the pivot_wider() function to put the people_on_moon and nuclear_bombs variables in their own columns. The data is then in the desired, tidy format.

10. Let's practice!

Now it's your turn, let's practice!