Groups of mutants
In addition to calculating summary statistics by group, you can mutate columns with group-specific values. For example, one technique to normalize values is to subtract the mean, then divide by the standard deviation. You could perform group-specific normalization using the following code.
a_tibble %>%
group_by(grp1, grp2) %>%
mutate(normalized_x = (x - mean(x)) / sd(x))
This exercise is part of the course
Introduction to Spark with sparklyr in R
Exercise instructions
A Spark connection has been created for you as spark_conn
. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl
.
- Group the contents of
track_metadata
byartist_name
. - Add a new column named
time_since_first_release
.- Make this equal to the groupwise
year
minus the firstyear
(that is, themin()
year
) that the artist released a track.
- Make this equal to the groupwise
- Arrange the rows in descending order of
time_since_first_release
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# track_metadata_tbl has been pre-defined
track_metadata_tbl
track_metadata_tbl %>%
# Group by artist
___ %>%
# Calc time since first release
___ %>%
# Arrange by descending time since first release
___