Get startedGet started for free

Random over-sampling

Only a very small fraction of the credit transfers are fraudulent. You're now going to over-sample the fraud cases in order to balance the class distribution. Feature Class in dataset creditcard takes value 1 in case of fraud and 0 otherwise.

You can use the console for displaying the columns of 'creditcard' with str(), printing the first 6 rows of the dataset with head() and check the Class-balance with table(creditcard$Class).

This exercise is part of the course

Fraud Detection in R

View Course

Exercise instructions

  • Load the ROSE package.
  • Specify n_new as the required number of cases in the over-sampled dataset such that the new dataset will consists of 30% fraud cases and thus 70% legitimate cases. For this, you have to divide the existing number of legitimate cases by the desired percentage of legitimate cases in the over-sampled dataset.
  • Use function ovun.sample() for over-sampling using Class ~ . as formula.
  • Check the class-balance of the over-sampled dataset.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load ROSE
___

# Calculate the total number of required cases in the over-sampled dataset
print(table(creditcard$Class))
n_new <- ___

# Over-sample
oversampling_result <- ___(formula = ___, data = ___,
                           method = ___, N = ___, seed = 2018)

# Verify the Class-balance of the over-sampled dataset
oversampled_credit <- oversampling_result$data
prop.table(___(___))
Edit and Run Code