LoslegenKostenlos loslegen

The resampling trade-off

A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.

To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.

A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.

The model is evaluated using:

  • Training accuracy: correct predictions on training data.
  • Test accuracy: correct predictions on new data.
  • Precision: how many predicted leavers actually left.
Metric Model A (without resampling) Model B (with resampling)
Training accuracy 85% 95%
Test Accuracy 82% 85%
Precision 80% 68%

Diese Übung ist Teil des Kurses

Advanced Probability: Uncertainty in Data

Kurs anzeigen

Interaktive Übung

In dieser interaktiven Übung kannst du die Theorie in die Praxis umsetzen.

Übung starten