The resampling trade-off
A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.
To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.
A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.
The model is evaluated using:
- Training accuracy: correct predictions on training data.
- Test accuracy: correct predictions on new data.
- Precision: how many predicted leavers actually left.
Metric | Model A (without resampling) | Model B (with resampling) |
---|---|---|
Training accuracy | 85% | 95% |
Test Accuracy | 82% | 85% |
Precision | 80% | 68% |
This exercise is part of the course
Advanced Probability: Uncertainty in Data
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
