The resampling trade-off

A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.

To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.

A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.

The model is evaluated using:

Training accuracy: correct predictions on training data.
Test accuracy: correct predictions on new data.
Precision: how many predicted leavers actually left.

Metric	Model A (without resampling)	Model B (with resampling)
Training accuracy	85%	95%
Test Accuracy	82%	85%
Precision	80%	68%

This exercise is part of the course

Advanced Probability: Uncertainty in Data

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise