Get startedGet started for free

Evaluate the Logistic Regression model

Accuracy is generally not a very reliable metric because it can be biased by the most common target class.

There are two other useful metrics:

  • precision and
  • recall.

Check the slides for this lesson to get the relevant expressions.

Precision is the proportion of positive predictions which are correct. For all flights which are predicted to be delayed, what proportion is actually delayed?

Recall is the proportion of positives outcomes which are correctly predicted. For all delayed flights, what proportion is correctly predicted by the model?

The precision and recall are generally formulated in terms of the positive target class. But it's also possible to calculate weighted versions of these metrics which look at both target classes.

The components of the confusion matrix are available as TN, TP, FN and FP, as well as the object prediction.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

  • Find the precision and recall.
  • Create a multi-class evaluator and evaluate weighted precision.
  • Create a binary evaluator and evaluate AUC using the "areaUnderROC" metric.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

from pyspark.ml.evaluation import MulticlassClassificationEvaluator, BinaryClassificationEvaluator

# Calculate precision and recall
precision = ____
recall = ____
print('precision = {:.2f}\nrecall    = {:.2f}'.format(precision, recall))

# Find weighted precision
multi_evaluator = ____
weighted_precision = multi_evaluator.____(prediction, {multi_evaluator.metricName: "____"})

# Find AUC
binary_evaluator = ____
auc = binary_evaluator.____(____, {____})
Edit and Run Code