Lazy train-test split
You have transformed the X
variables. Now you need to finish your data prep by transforming the y
variables and splitting your data into train and test sets.
The variables X
and y
, which you created in the last exercise, are available in your environment.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- Import the
train_test_split()
function fromdask_ml.model_selection
. - The popularity scores in
y
are in the range 0-100, divide them by 100 so they are in the range 0-1. - Split the data into train and test sets using the
train_test_split()
function, make sure to shuffle the data, and set the test fraction to 20% of the data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the train_test_split function
from ____ import ____
# Rescale the target values
y = ____
# Split the data into train and test sets
X_train, X_test, y_train, y_test = ____
print(X_train)