The resampling trade-off
A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.
To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.
A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.
The model is evaluated using:
- Training accuracy: correct predictions on training data.
- Test accuracy: correct predictions on new data.
- Precision: how many predicted leavers actually left.
Metric | Model A (without resampling) | Model B (with resampling) |
---|---|---|
Training accuracy | 85% | 95% |
Test Accuracy | 82% | 85% |
Precision | 80% | 68% |
Diese Übung ist Teil des Kurses
Advanced Probability: Uncertainty in Data
Interaktive Übung
In dieser interaktiven Übung kannst du die Theorie in die Praxis umsetzen.
