Get startedGet started for free

Groups of mutants

In addition to calculating summary statistics by group, you can mutate columns with group-specific values. For example, one technique to normalize values is to subtract the mean, then divide by the standard deviation. You could perform group-specific normalization using the following code.

a_tibble %>%
  group_by(grp1, grp2) %>%
  mutate(normalized_x = (x - mean(x)) / sd(x))

This exercise is part of the course

Introduction to Spark with sparklyr in R

View Course

Exercise instructions

A Spark connection has been created for you as spark_conn. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl.

  • Group the contents of track_metadata by artist_name.
  • Add a new column named time_since_first_release.
    • Make this equal to the groupwise year minus the first year (that is, the min() year) that the artist released a track.
  • Arrange the rows in descending order of time_since_first_release.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# track_metadata_tbl has been pre-defined
track_metadata_tbl

track_metadata_tbl %>%
  # Group by artist
  ___ %>%
  # Calc time since first release
  ___ %>%
  # Arrange by descending time since first release
  ___
Edit and Run Code