Logistic Regression model training

After creating labels and features for the data, we’re ready to build a model that can learn from it (training). But before you train the model, in this final part of the exercise, you'll split the data into training and test, run Logistic Regression model on the training data, and finally check the accuracy of the model trained on training data.

Remember, you have a SparkContext sc available in your workspace, as well as the samples variable.

Bu egzersiz

Big Data Fundamentals with PySpark

kursunun bir parçasıdır

Kursu Görüntüle

Egzersiz talimatları

Split the combined data into training and test datasets in 80:20 ratio.
Train the Logistic Regression model with the training dataset.
Create a prediction label from the trained model on the test dataset.
Combine the labels in the test dataset with the labels in the prediction dataset using zip function.
Calculate the accuracy of the trained model using original and predicted labels, and print it.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Split the data into training and testing
train_samples,test_samples = samples.____([0.8, 0.2])

# Train the model
model = LogisticRegressionWithLBFGS.train(____)

# Create a prediction label from the test data
predictions = model.____(test_samples.map(lambda x: x.features))

# Combine original labels with the predicted labels
labels_and_preds = test_samples.map(lambda x: x.label).zip(____)

# Check the accuracy of the model on the test data
accuracy = labels_and_preds.filter(lambda x: x[0] == x[____]).count() / float(test_samples.count())
print("Model accuracy : {:.2f}".format(____))

Kodu Düzenle ve Çalıştır