Wide Versus Long Data

In addition to tidy data, we can have long data versus wide data. We call a dataset as long data because the format of the data has many more rows than columns, and we call data wide data, because it has more columns than rows. You have seen how to transform a wide dataset (dem_score) into a long one with gather() and transform it into a different wide format with (spread).

In general, I tend to work with long data because this format makes it easeir to aggregate the data for plots when I have a lot of covariates. Let's look at what's possible because the data is in a long format.

Let's practice with another dataset in long format, called fertilityTidy. You can look at the original data as fertilityData. We'll summarize it in two different ways.

Look at fertilityTidy. Show the average fertility by country to present day by using dplyr verbs, calling this variable meanCountryRate.
Assign the summarized data to fertilityMeanByCountry.
Show fertilityMeanByCountry.
Next, show average fertility by Year, using group_by/summarize() assigning the summarized data to fertilityMeanByYear.
Show fertilityMeanByYear.

The Magic of ggplot2

ggplot2 and categorical data

Introduction to dplyr

The Whys and Hows of Tidy Data

Simple Stats and Modeling with broom

Exercise

Wide Versus Long Data

Instructions