Computing robust z-scores
Let us look again at the dataset transfers
that we have used in Chapter 1.
The dataset contains 222 transactions and there are four known fraud cases, indicated with a 1
in the variable fraud_flag
. We have already studied the frequency and recency features before.
This time we will only focus on the variable amount
and we will try to detect fraud cases by applying univariate outlier detection techniques on this variable.
Don't hesitate to explore the dataset in the Console if you need to refresh your memory about its structure. You can also refer to the slides to check the functions that were shown in the previous video.
Cet exercice fait partie du cours
Fraud Detection in R
Instructions
- Find out which observations are identified as fraud.
- Compute the median and the median absolute deviation (mad) for the variable
amount
. - Use the robust estimates for location and scatter to compute the robust z-score for each observation.
- Which observations have a robust z-score higher than 3 in absolute value?
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Get observations identified as fraud
which(___ == ___)
# Compute median and mean absolute deviation for `amount`
m <- median(___)
s <- ___(___)
# Compute robust z-score for each observation
robzscore <- abs((___ - ___) / (___))
# Get observations with robust z-score higher than 3 in absolute value
which(abs(___) > ___)