Exercise

Random over-sampling

Only a very small fraction of the credit transfers are fraudulent. You're now going to over-sample the fraud cases in order to balance the class distribution. Feature Class in dataset creditcard takes value 1 in case of fraud and 0 otherwise.

You can use the console for displaying the columns of 'creditcard' with str(), printing the first 6 rows of the dataset with head() and check the Class-balance with table(creditcard$Class).

Instructions

100 XP
  • Load the ROSE package.
  • Specify n_new as the required number of cases in the over-sampled dataset such that the new dataset will consists of 30% fraud cases and thus 70% legitimate cases. For this, you have to divide the existing number of legitimate cases by the desired percentage of legitimate cases in the over-sampled dataset.
  • Use function ovun.sample() for over-sampling using Class ~ . as formula.
  • Check the class-balance of the over-sampled dataset.