Shrinking the majority group
Rather than increasing the number of fraud cases in the dataset, you can randomly remove legitimate cases to balance the dataset. Let's under-sample the majority class (Class
= 0) in the creditcard
dataset. You can use table()
in the console to know how many fraudulent and legitimate transactions there are in the dataset.
This exercise is part of the course
Fraud Detection in R
Exercise instructions
- Load the ROSE library.
- Specify
n_new
as the required number of cases in the under-sampled dataset such that the new dataset will consist of 40% fraud cases. For this, you have to divide the number of fraud cases by the desired percentage of fraud cases in the under-sampled dataset. - Under-sample the dataset.
- Use
table()
andprop.table()
to check the class-balance of the under-sampled dataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load ROSE
___
# Calculate the required number of cases in the over-sampled dataset
n_new <- ___
# Under-sample
undersampling_result <- ___(formula = ___, data = ___,
___ = ___, ___ = ___, seed = 2018)
# Verify the Class-balance of the under-sampled dataset
undersampled_credit <- undersampling_result$___
___(___(___))