CommencerCommencer gratuitement

Computing robust z-scores

Let us look again at the dataset transfers that we have used in Chapter 1. The dataset contains 222 transactions and there are four known fraud cases, indicated with a 1 in the variable fraud_flag. We have already studied the frequency and recency features before. This time we will only focus on the variable amount and we will try to detect fraud cases by applying univariate outlier detection techniques on this variable.

Don't hesitate to explore the dataset in the Console if you need to refresh your memory about its structure. You can also refer to the slides to check the functions that were shown in the previous video.

Cet exercice fait partie du cours

Fraud Detection in R

Afficher le cours

Instructions

  • Find out which observations are identified as fraud.
  • Compute the median and the median absolute deviation (mad) for the variable amount.
  • Use the robust estimates for location and scatter to compute the robust z-score for each observation.
  • Which observations have a robust z-score higher than 3 in absolute value?

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Get observations identified as fraud
which(___ == ___)

# Compute median and mean absolute deviation for `amount`
m <- median(___)
s <- ___(___)

# Compute robust z-score for each observation
robzscore <- abs((___ - ___) / (___))

# Get observations with robust z-score higher than 3 in absolute value
which(abs(___) > ___)
Modifier et exécuter le code