Computing robust z-scores
Let us look again at the dataset transfers
that we have used in Chapter 1.
The dataset contains 222 transactions and there are four known fraud cases, indicated with a 1
in the variable fraud_flag
. We have already studied the frequency and recency features before.
This time we will only focus on the variable amount
and we will try to detect fraud cases by applying univariate outlier detection techniques on this variable.
Don't hesitate to explore the dataset in the Console if you need to refresh your memory about its structure. You can also refer to the slides to check the functions that were shown in the previous video.
This exercise is part of the course
Fraud Detection in R
Exercise instructions
- Find out which observations are identified as fraud.
- Compute the median and the median absolute deviation (mad) for the variable
amount
. - Use the robust estimates for location and scatter to compute the robust z-score for each observation.
- Which observations have a robust z-score higher than 3 in absolute value?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get observations identified as fraud
which(___ == ___)
# Compute median and mean absolute deviation for `amount`
m <- median(___)
s <- ___(___)
# Compute robust z-score for each observation
robzscore <- abs((___ - ___) / (___))
# Get observations with robust z-score higher than 3 in absolute value
which(abs(___) > ___)