Build a Logistic Regression model
You've already built a Decision Tree model using the flights data. Now you're going to create a Logistic Regression model on the same data.
The objective is to predict whether a flight is likely to be delayed by at least 15 minutes (label 1) or not (label 0).
Although you have a variety of predictors at your disposal, you'll only use the mon, depart and duration columns for the moment. These are numerical features which can immediately be used for a Logistic Regression model. You'll need to do a little more work before you can include categorical features. Stay tuned!
The data have been split into training and testing sets and are available as flights_train and flights_test.
Latihan ini adalah bagian dari kursus
Machine Learning with PySpark
Petunjuk latihan
- Import the class for creating a Logistic Regression classifier.
- Create a classifier object and train it on the training data.
- Make predictions for the testing data and create a confusion matrix.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Import the logistic regression class
from pyspark.ml.____ import ____
# Create a classifier object and train on training data
logistic = ____().____(____)
# Create predictions for the testing data and show confusion matrix
prediction = ____.____(____)
prediction.groupBy(____, ____).____().show()