Random over-sampling
Only a very small fraction of the credit transfers are fraudulent. You're now going to over-sample the fraud cases in order to balance the class distribution. Feature Class
in dataset creditcard
takes value 1 in case of fraud and 0 otherwise.
You can use the console for displaying the columns of 'creditcard' with str()
, printing the first 6 rows of the dataset with head()
and check the Class-balance with table(creditcard$Class)
.
This exercise is part of the course
Fraud Detection in R
Exercise instructions
- Load the
ROSE
package. - Specify
n_new
as the required number of cases in the over-sampled dataset such that the new dataset will consists of 30% fraud cases and thus 70% legitimate cases. For this, you have to divide the existing number of legitimate cases by the desired percentage of legitimate cases in the over-sampled dataset. - Use function
ovun.sample()
for over-sampling usingClass ~ .
as formula. - Check the class-balance of the over-sampled dataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load ROSE
___
# Calculate the total number of required cases in the over-sampled dataset
print(table(creditcard$Class))
n_new <- ___
# Over-sample
oversampling_result <- ___(formula = ___, data = ___,
method = ___, N = ___, seed = 2018)
# Verify the Class-balance of the over-sampled dataset
oversampled_credit <- oversampling_result$data
prop.table(___(___))