Shrinking the majority group
Rather than increasing the number of fraud cases in the dataset, you can randomly remove legitimate cases to balance the dataset. Let's under-sample the majority class (Class = 0) in the creditcard dataset. You can use table() in the console to know how many fraudulent and legitimate transactions there are in the dataset.
Bu egzersiz
Fraud Detection in R
kursunun bir parçasıdırEgzersiz talimatları
- Load the ROSE library.
- Specify
n_newas the required number of cases in the under-sampled dataset such that the new dataset will consist of 40% fraud cases. For this, you have to divide the number of fraud cases by the desired percentage of fraud cases in the under-sampled dataset. - Under-sample the dataset.
- Use
table()andprop.table()to check the class-balance of the under-sampled dataset.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Load ROSE
___
# Calculate the required number of cases in the over-sampled dataset
n_new <- ___
# Under-sample
undersampling_result <- ___(formula = ___, data = ___,
___ = ___, ___ = ___, seed = 2018)
# Verify the Class-balance of the under-sampled dataset
undersampled_credit <- undersampling_result$___
___(___(___))