1
Tidy Data
Free
You'll be introduced to the concept of tidy data which is central to this course. In the first two lessons, you'll jump straight into the action by separating messy character columns into tidy variables and observations ready for analysis. In the final lesson, you'll learn how to overwrite and remove missing values.
2
From Wide to Long and Back
This chapter is all about pivoting data from a wide to long format and back again using the pivot_longer() and pivot_wider() functions. You'll need these functions when variables are hidden in messy column names or when variables are stored in rows instead of columns. You'll learn about space dogs, nuclear bombs, and planet temperatures along the way.
3
Expanding Data
Values can often be missing in your data, and sometimes entire observations are absent too. In this chapter, you'll learn how to complete your dataset with these missing observations. You'll add observations with zero values to counted data, expand time series to a full sequence of intervals, and more!
4
Rectangling Data
In the final chapter, you'll learn how to turn nested data structures such as JSON and XML files into tidy, rectangular data. This skill will enable you to process data from web APIs. You'll also learn how nested data structures can be used to write elegant modeling pipelines that produce tidy outputs.

Initializing

WHO obesity vs. life expectancy

You've been given a sample of WHO data (who_df) with obesity percentages and life expectancy data per country, year, and sex. You want to visually inspect the correlation between obesity and life expectancy.

However, the data is very messy with four variables hidden in the column names. Each column name is made up of three parts separated by underscores: Values for the year, followed by those for sex, and then values for either pct.obese or life.exp. Since the third part of the column name string holds two variables you'll need to use the special ".value" value in the names_to argument.

You'll pivot the data into a tidy format and create the scatterplot.

The ggplot2 package has been pre-loaded for you.

Pivot the data so that each variable (year, sex, pct.obese, life.exp) has a column of the correct data type.