Get startedGet started for free

Mutating columns

It may surprise you, but not all datasets start out perfectly clean! Often you have to fix values, or create new columns derived from your existing data. The process of changing or adding columns is called mutation in dplyr terminology, and is performed using mutate(). This function takes a tibble, and named arguments to update columns. The names of each of these arguments is the name of the columns to change or add, and the value is an expression explaining how to update it. For example, given a tibble with columns x and y, the following code would update x and create a new column z.

a_tibble %>%
  mutate(
    x = x + y,
    z = log(x)  
  )

In case you hadn't got the message already that base-R functions don't work with Spark tibbles, you can't use within() or transform() for this purpose.

This exercise is part of the course

Introduction to Spark with sparklyr in R

View Course

Exercise instructions

A Spark connection has been created for you as spark_conn. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl.

  • Select the title, and duration fields. Note that the durations are in seconds.
  • Pipe the result of this to mutate() to create a new field, duration_minutes, that contains the track duration in minutes.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# track_metadata_tbl has been pre-defined
track_metadata_tbl

# Manipulate the track metadata
track_metadata_tbl %>%
  # Select columns
  ___ %>%
  # Mutate columns
  ___
Edit and Run Code