Get startedGet started for free

Using Dask to train a linear model

Dask can be used to train machine learning models on datasets that are too big to fit in memory, and allows you to distribute the data loading, preprocessing, and training across multiple threads, processes, and even across multiple computers.

You have been tasked with training a machine learning model which will predict the popularity of songs in the Spotify dataset you used in previous chapters. The data has already been loaded as lazy Dask DataFrames. The input variables are available as dask_X and contain a few numeric columns, such as the song's tempo and danceability. The target values are available as dask_y and are the popularity score of each song.

This exercise is part of the course

Parallel Programming with Dask in Python

View Course

Exercise instructions

  • Import the SGDRegressor class from sklearn.linear_model and the Incremental class from dask_ml.wrappers.
  • Create a SGDRegressor linear regression model.
  • Use the Incremental class to wrap the model so that it can be trained with a Dask dataset, and set the scoring parameter to 'neg_mean_squared_error'.
  • Fit the wrapped model using only one loop through the data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the SGDRegressor and the Incremental wrapper
from ____ import ____
from ____ import ____

# Create a SGDRegressor model
model = ____

# Wrap the model so that it works with Dask
dask_model = ____

# Fit the wrapped model
dask_model.____
Edit and Run Code