Exercise

Lazily transforming training data

Preprocessing your input variables is a vital step in machine learning and will often improve the accuracy of the model you create. In the last couple of exercises, the Spotify data was preprocessed for you, but it is important that you know how to do it yourself.

In this exercise, you will use the StandardScaler() scaler object, which transforms columns of an array so that they have a mean of zero and standard deviation of one.

The Dask DataFrame of Spotify songs is available in your environment as dask_df. It contains both the target popularity scores and the input variables which you used to predict these scores.

Instructions

100 XP
  • Import the StandardScaler() class from dask_ml.preprocessing.
  • Select the 'popularity' column from the DataFrame and assign it to the variable y.
  • Create a StandardScaler object and fit it to the X data.
  • Use the scaler to transform X.