Get startedGet started for free

The resampling trade-off

A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.

To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.

A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.

The model is evaluated using:

  • Training accuracy: correct predictions on training data.
  • Test accuracy: correct predictions on new data.
  • Precision: how many predicted leavers actually left.
Metric Model A (without resampling) Model B (with resampling)
Training accuracy 85% 95%
Test Accuracy 82% 85%
Precision 80% 68%

This exercise is part of the course

Advanced Probability: Uncertainty in Data

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise