(Hey you) What's that sound?
Songs start out as an analogue thing: their sound is really a load of vibrations of air. In order to analyze a song, you need to turn it into some meaningful numbers. Tracks in the Million Song Dataset have twelve timbre measurements taken at regular time intervals throughout the song. (Timbre is a measure of the perceived quality of a sound; you can use it to distinguish voices from string instruments from percussion instruments, for example.)
In this chapter, you are going to try and predict the year a track was released, based upon its timbre. That is, you are going to use these timbre measurements to generate features for the models. (Recall that feature is machine learning terminology for an input variable in a model. They are often called explanatory variables in statistics.)
The timbre data takes the form of a matrix, with rows representing the time points, and columns representing the different timbre measurements. Thus all the timbre matrices have twelve columns, but the number of rows differs from song to song. The mean of each column estimates the average of a timbre measurement over the whole song. These can be used to generate twelve features for the model.
This exercise is part of the course
Introduction to Spark with sparklyr in R
Exercise instructions
timbre
, containing the timbre measurements for Lady Gaga's "Poker Face", has been pre-defined in your workspace.
- Use
colMeans()
to get the column means oftimbre
. Assign the results tomean_timbre
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# timbre has been pre-defined
timbre
# Calculate column means
(mean_timbre <- ___)