ComeçarComece de graça

The resampling trade-off

A large tech company wants to predict employee attrition to improve retention. But only 12% of employees have left, so the model is trained mostly on "stay" cases (88%), making it hard to detect those at risk of leaving.

To fix this imbalance, HR analysts use synthetic resampling to create more "leave" cases and balance the data.

A key requirement: the model should avoid misclassifying loyal employees as "high-risk leavers," to prevent wasted retention efforts.

The model is evaluated using:

  • Training accuracy: correct predictions on training data.
  • Test accuracy: correct predictions on new data.
  • Precision: how many predicted leavers actually left.
Metric Model A (without resampling) Model B (with resampling)
Training accuracy 85% 95%
Test Accuracy 82% 85%
Precision 80% 68%

Este exercício faz parte do curso

Advanced Probability: Uncertainty in Data

Ver curso

Exercício interativo prático

Transforme a teoria em ação com um de nossos exercícios interativos

Começar o exercício