(Hey you) What's that sound?

Songs start out as an analogue thing: their sound is really a load of vibrations of air. In order to analyze a song, you need to turn it into some meaningful numbers. Tracks in the Million Song Dataset have twelve timbre measurements taken at regular time intervals throughout the song. (Timbre is a measure of the perceived quality of a sound; you can use it to distinguish voices from string instruments from percussion instruments, for example.)

In this chapter, you are going to try and predict the year a track was released, based upon its timbre. That is, you are going to use these timbre measurements to generate features for the models. (Recall that feature is machine learning terminology for an input variable in a model. They are often called explanatory variables in statistics.)

The timbre data takes the form of a matrix, with rows representing the time points, and columns representing the different timbre measurements. Thus all the timbre matrices have twelve columns, but the number of rows differs from song to song. The mean of each column estimates the average of a timbre measurement over the whole song. These can be used to generate twelve features for the model.

timbre, containing the timbre measurements for Lady Gaga's "Poker Face", has been pre-defined in your workspace.

Use colMeans() to get the column means of timbre. Assign the results to mean_timbre.

Light My Fire: Starting To Use Spark With dplyr Syntax

Tools of the Trade: Advanced dplyr Usage

Going Native: Use The Native Interface to Manipulate Spark DataFrames

Case Study: Learning to be a Machine: Running Machine Learning Models on Spark

Exercise

(Hey you) What's that sound?

Instructions