LoslegenKostenlos loslegen

Using Dask to train a linear model

Dask can be used to train machine learning models on datasets that are too big to fit in memory, and allows you to distribute the data loading, preprocessing, and training across multiple threads, processes, and even across multiple computers.

You have been tasked with training a machine learning model which will predict the popularity of songs in the Spotify dataset you used in previous chapters. The data has already been loaded as lazy Dask DataFrames. The input variables are available as dask_X and contain a few numeric columns, such as the song's tempo and danceability. The target values are available as dask_y and are the popularity score of each song.

Diese Übung ist Teil des Kurses

Parallel Programming with Dask in Python

Kurs anzeigen

Anleitung zur Übung

  • Import the SGDRegressor class from sklearn.linear_model and the Incremental class from dask_ml.wrappers.
  • Create a SGDRegressor linear regression model.
  • Use the Incremental class to wrap the model so that it can be trained with a Dask dataset, and set the scoring parameter to 'neg_mean_squared_error'.
  • Fit the wrapped model using only one loop through the data.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the SGDRegressor and the Incremental wrapper
from ____ import ____
from ____ import ____

# Create a SGDRegressor model
model = ____

# Wrap the model so that it works with Dask
dask_model = ____

# Fit the wrapped model
dask_model.____
Code bearbeiten und ausführen