Session Ready
Exercise

Shrinking the majority group

Rather than increasing the number of fraud cases in the dataset, you can randomly remove legitimate cases to balance the dataset. Let's under-sample the majority class (Class = 0) in the creditcard dataset. You can use table() in the console to know how many fraudulent and legitimate transactions there are in the dataset.

Instructions
100 XP
  • Load the ROSE library.
  • Specify n_new as the required number of cases in the under-sampled dataset such that the new dataset will consist of 40% fraud cases. For this, you have to divide the number of fraud cases by the desired percentage of fraud cases in the under-sampled dataset.
  • Under-sample the dataset.
  • Use table() and prop.table() to check the class-balance of the under-sampled dataset.