Wide Versus Long Data
In addition to tidy data, we can have long data versus wide data. We call a dataset as long data
because the format of the data has many more rows than columns, and we call data wide data,
because it has more columns than rows. You have seen how to transform a wide dataset (dem_score
)
into a long one with gather()
and transform it into a different wide format with (spread
).
In general, I tend to work with long data because this format makes it easeir to aggregate the data for plots when I have a lot of covariates. Let's look at what's possible because the data is in a long format.
Let's practice with another dataset in long format, called fertilityTidy
. You can look at the
original data as fertilityData
. We'll summarize it in two different ways.
This exercise is part of the course
RBootcamp
Exercise instructions
- Look at
fertilityTidy
. Show the average fertility by country to present day by usingdplyr
verbs, calling this variablemeanCountryRate
. - Assign the summarized data to
fertilityMeanByCountry
. - Show
fertilityMeanByCountry
. - Next, show average fertility by
Year
, usinggroup_by/summarize()
assigning the summarized data tofertilityMeanByYear
. - Show
fertilityMeanByYear
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
fertilityMeanByCountry <- fertilityTidy %>%
#show fertlityMeanByCountry
fertilityMeanByCountry
fertilityMeanByYear <- fertilityTidy %>%
#show fertilityMeanByYear
fertilityMeanByYear