Using Dask to train a linear model
Dask can be used to train machine learning models on datasets that are too big to fit in memory, and allows you to distribute the data loading, preprocessing, and training across multiple threads, processes, and even across multiple computers.
You have been tasked with training a machine learning model which will predict the popularity of songs in the Spotify dataset you used in previous chapters. The data has already been loaded as lazy Dask DataFrames. The input variables are available as dask_X and contain a few numeric columns, such as the song's tempo and danceability. The target values are available as dask_y and are the popularity score of each song.
Cet exercice fait partie du cours
Parallel Programming with Dask in Python
Instructions
- Import the
SGDRegressorclass fromsklearn.linear_modeland theIncrementalclass fromdask_ml.wrappers. - Create a
SGDRegressorlinear regression model. - Use the
Incrementalclass to wrap the model so that it can be trained with a Dask dataset, and set thescoringparameter to'neg_mean_squared_error'. - Fit the wrapped model using only one loop through the data.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import the SGDRegressor and the Incremental wrapper
from ____ import ____
from ____ import ____
# Create a SGDRegressor model
model = ____
# Wrap the model so that it works with Dask
dask_model = ____
# Fit the wrapped model
dask_model.____