Evaluate the Logistic Regression model
Accuracy is generally not a very reliable metric because it can be biased by the most common target class.
There are two other useful metrics:
- precision and
- recall.
Check the slides for this lesson to get the relevant expressions.
Precision is the proportion of positive predictions which are correct. For all flights which are predicted to be delayed, what proportion is actually delayed?
Recall is the proportion of positives outcomes which are correctly predicted. For all delayed flights, what proportion is correctly predicted by the model?
The precision and recall are generally formulated in terms of the positive target class. But it's also possible to calculate weighted versions of these metrics which look at both target classes.
The components of the confusion matrix are available as TN, TP, FN and FP, as well as the object prediction.
Deze oefening maakt deel uit van de cursus
Machine Learning with PySpark
Oefeninstructies
- Find the precision and recall.
- Create a multi-class evaluator and evaluate weighted precision.
- Create a binary evaluator and evaluate AUC using the
"areaUnderROC"metric.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
from pyspark.ml.evaluation import MulticlassClassificationEvaluator, BinaryClassificationEvaluator
# Calculate precision and recall
precision = ____
recall = ____
print('precision = {:.2f}\nrecall = {:.2f}'.format(precision, recall))
# Find weighted precision
multi_evaluator = ____
weighted_precision = multi_evaluator.____(prediction, {multi_evaluator.metricName: "____"})
# Find AUC
binary_evaluator = ____
auc = binary_evaluator.____(____, {____})