Create the pipeline

1
Getting to know PySpark
Free
In this chapter, you'll learn how Spark manages data and how can you read and write tables from Python.
2
Manipulating data
In this chapter, you'll learn about the pyspark.sql module, which provides optimized data queries to your Spark session.
3
Getting started with machine learning pipelines
PySpark has built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines. You'll learn about them in this chapter.
4
Model tuning and selection
In this last chapter, you'll apply what you've learned to create a model that predicts which flights will be delayed.

Initializing

Exercise