Groups of mutants

In addition to calculating summary statistics by group, you can mutate columns with group-specific values. For example, one technique to normalize values is to subtract the mean, then divide by the standard deviation. You could perform group-specific normalization using the following code.

a_tibble %>%
  group_by(grp1, grp2) %>%
  mutate(normalized_x = (x - mean(x)) / sd(x))

A Spark connection has been created for you as spark_conn. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl.

Group the contents of track_metadata by artist_name.
Add a new column named time_since_first_release.
- Make this equal to the groupwise year minus the first year (that is, the min() year) that the artist released a track.
Arrange the rows in descending order of time_since_first_release.

Light My Fire: Starting To Use Spark With dplyr Syntax

Tools of the Trade: Advanced dplyr Usage

Going Native: Use The Native Interface to Manipulate Spark DataFrames

Case Study: Learning to be a Machine: Running Machine Learning Models on Spark

Exercise

Groups of mutants

Instructions