Lazily transforming training data

Preprocessing your input variables is a vital step in machine learning and will often improve the accuracy of the model you create. In the last couple of exercises, the Spotify data was preprocessed for you, but it is important that you know how to do it yourself.

In this exercise, you will use the StandardScaler() scaler object, which transforms columns of an array so that they have a mean of zero and standard deviation of one.

The Dask DataFrame of Spotify songs is available in your environment as dask_df. It contains both the target popularity scores and the input variables which you used to predict these scores.

Import the StandardScaler() class from dask_ml.preprocessing.
Select the 'popularity' column from the DataFrame and assign it to the variable y.
Create a StandardScaler object and fit it to the X data.
Use the scaler to transform X.

Lazy Evaluation and Parallel Computing

Parallel Processing of Big, Structured Data

Dask Bags for Unstructured Data

Dask Machine Learning and Final Pieces

Exercise

Lazily transforming training data

Instructions