Lazily transforming training data
Preprocessing your input variables is a vital step in machine learning and will often improve the accuracy of the model you create. In the last couple of exercises, the Spotify data was preprocessed for you, but it is important that you know how to do it yourself.
In this exercise, you will use the StandardScaler()
scaler object, which transforms columns of an array so that they have a mean of zero and standard deviation of one.
The Dask DataFrame of Spotify songs is available in your environment as dask_df
. It contains both the target popularity scores and the input variables which you used to predict these scores.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- Import the
StandardScaler()
class fromdask_ml.preprocessing
. - Select the
'popularity'
column from the DataFrame and assign it to the variabley
. - Create a
StandardScaler
object and fit it to theX
data. - Use the scaler to transform
X
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the StandardScaler class
from ____ import ____
X = dask_df[['duration_ms', 'explicit', 'danceability', 'acousticness', 'instrumentalness', 'tempo']]
# Select the target variable
y = ____
# Create a StandardScaler object and fit it on X
scaler = ____
scaler.____(____)
# Transform X
X = scaler.____
print(X)