Get startedGet started for free

Computing robust z-scores

Let us look again at the dataset transfers that we have used in Chapter 1. The dataset contains 222 transactions and there are four known fraud cases, indicated with a 1 in the variable fraud_flag. We have already studied the frequency and recency features before. This time we will only focus on the variable amount and we will try to detect fraud cases by applying univariate outlier detection techniques on this variable.

Don't hesitate to explore the dataset in the Console if you need to refresh your memory about its structure. You can also refer to the slides to check the functions that were shown in the previous video.

This exercise is part of the course

Fraud Detection in R

View Course

Exercise instructions

  • Find out which observations are identified as fraud.
  • Compute the median and the median absolute deviation (mad) for the variable amount.
  • Use the robust estimates for location and scatter to compute the robust z-score for each observation.
  • Which observations have a robust z-score higher than 3 in absolute value?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get observations identified as fraud
which(___ == ___)

# Compute median and mean absolute deviation for `amount`
m <- median(___)
s <- ___(___)

# Compute robust z-score for each observation
robzscore <- abs((___ - ___) / (___))

# Get observations with robust z-score higher than 3 in absolute value
which(abs(___) > ___)
Edit and Run Code