Get startedGet started for free

Lazily transforming training data

Preprocessing your input variables is a vital step in machine learning and will often improve the accuracy of the model you create. In the last couple of exercises, the Spotify data was preprocessed for you, but it is important that you know how to do it yourself.

In this exercise, you will use the StandardScaler() scaler object, which transforms columns of an array so that they have a mean of zero and standard deviation of one.

The Dask DataFrame of Spotify songs is available in your environment as dask_df. It contains both the target popularity scores and the input variables which you used to predict these scores.

This exercise is part of the course

Parallel Programming with Dask in Python

View Course

Exercise instructions

  • Import the StandardScaler() class from dask_ml.preprocessing.
  • Select the 'popularity' column from the DataFrame and assign it to the variable y.
  • Create a StandardScaler object and fit it to the X data.
  • Use the scaler to transform X.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the StandardScaler class
from ____ import ____

X = dask_df[['duration_ms', 'explicit', 'danceability', 'acousticness', 'instrumentalness', 'tempo']]

# Select the target variable
y = ____

# Create a StandardScaler object and fit it on X
scaler = ____
scaler.____(____)

# Transform X
X = scaler.____
print(X)
Edit and Run Code