Get startedGet started for free

Come together

The features to the models you are about to run are contained in the timbre dataset, but the response – the year – is contained in the track_metadata dataset. Before you run the model, you are going to have to join these two datasets together. In this case, there is a one to one matching of rows in the two datasets, so you need an inner join.

There is one more data cleaning task you need to do. The year column contains integers, but Spark modeling functions require real numbers. You need to convert the year column to numeric.

This exercise is part of the course

Introduction to Spark with sparklyr in R

View Course

Exercise instructions

A Spark connection has been created for you as spark_conn. Tibbles attached to the track metadata and timbre data stored in Spark have been pre-defined as track_metadata_tbl and timbre_tbl respectively.

  • Inner join the track metadata to the timbre data by the track_id column.
  • Convert the year column to numeric.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# track_metadata_tbl, timbre_tbl pre-defined
track_metadata_tbl
timbre_tbl

track_metadata_tbl %>%
  # Inner join to timbre_tbl
  ___ %>%
  # Convert year to numeric
  ___
Edit and Run Code