Get startedGet started for free

Shrinking the majority group

Rather than increasing the number of fraud cases in the dataset, you can randomly remove legitimate cases to balance the dataset. Let's under-sample the majority class (Class = 0) in the creditcard dataset. You can use table() in the console to know how many fraudulent and legitimate transactions there are in the dataset.

This exercise is part of the course

Fraud Detection in R

View Course

Exercise instructions

  • Load the ROSE library.
  • Specify n_new as the required number of cases in the under-sampled dataset such that the new dataset will consist of 40% fraud cases. For this, you have to divide the number of fraud cases by the desired percentage of fraud cases in the under-sampled dataset.
  • Under-sample the dataset.
  • Use table() and prop.table() to check the class-balance of the under-sampled dataset.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load ROSE
___

# Calculate the required number of cases in the over-sampled dataset
n_new <- ___

# Under-sample
undersampling_result <- ___(formula = ___, data = ___,
                           ___ = ___, ___ = ___, seed = 2018)

# Verify the Class-balance of the under-sampled dataset
undersampled_credit <- undersampling_result$___
___(___(___))
Edit and Run Code