PySpark MLlib algorithms
Before using any Machine learning algorithms in PySpark shell, you'll have to import the submodules of pyspark.mllib
library and then choose the appropriate class that is needed for a specific machine learning task.
In this simple exercise, you'll learn how to import the different submodules of pyspark.mllib
along with the classes that are needed for performing Collaborative filtering, Classification, and Clustering algorithms.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
- Import
pyspark.mllib
recommendation submodule and Alternating Least Squares class. - Import
pyspark.mllib
classification submodule and Logistic Regression with LBFGS class. - Import
pyspark.mllib
clustering submodule and kmeans class.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the library for ALS
from pyspark.mllib.____ import ____
# Import the library for Logistic Regression
from ____.____.____ import ____
# Import the library for Kmeans
from ____.____.____ ____ ____