ComenzarEmpieza gratis

Come together

The features to the models you are about to run are contained in the timbre dataset, but the response – the year – is contained in the track_metadata dataset. Before you run the model, you are going to have to join these two datasets together. In this case, there is a one to one matching of rows in the two datasets, so you need an inner join.

There is one more data cleaning task you need to do. The year column contains integers, but Spark modeling functions require real numbers. You need to convert the year column to numeric.

Este ejercicio forma parte del curso

Introduction to Spark with sparklyr in R

Ver curso

Instrucciones del ejercicio

A Spark connection has been created for you as spark_conn. Tibbles attached to the track metadata and timbre data stored in Spark have been pre-defined as track_metadata_tbl and timbre_tbl respectively.

  • Inner join the track metadata to the timbre data by the track_id column.
  • Convert the year column to numeric.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# track_metadata_tbl, timbre_tbl pre-defined
track_metadata_tbl
timbre_tbl

track_metadata_tbl %>%
  # Inner join to timbre_tbl
  ___ %>%
  # Convert year to numeric
  ___
Editar y ejecutar código