Frequency features
1. Frequency features
In this lesson we will create features related to frequency that help to detect fraud.2. Need for additional features
We illustrate the techniques on dataset "trans". It contains credit payments made by Alice and Bob. The first column, "fraud-flag", equals zero if the transaction is legitimate and equals one if the transfer was fraudulent. There are two fraudulent transactions in the dataset which Alice and Bob did not allow. Other columns include the beneficiary country, the authentication method, the payment channel, and the transferred amount. We can see that 1531 dollars were stolen from Alice and 3779 dollars were stolen from Bob. The columns of a dataset are called variables, attributes or features. We need features that capture suspicious behavior of fraud cases. The original features, however, may be insufficient to detect anomalies. So, we create new features that distinguish the fraudulent transfers from the regular ones.3. Alice's & Bob's profile
Each time Alice makes a transfer, she authenticates herself using one of five possible authentication methods. She used authentication methods 1, 3 and 5 several times. Method 4 was only used once when the fraudster pretended to be Alice and stole from her. So when someone logs into Alice's bank account using an authentication method other than 1, 3 or 5, this could be considered suspicious.4. Alice's & Bob's profile
Bob, on the other hand, used mostly authentication methods 2 and 4. To characterize Alice's and Bob's authentication preference, we create a frequency feature that keeps track of how frequently a person used a specific authentication method. So, a frequency feature quantifies a certain aspect of someone's individual profile.5. Frequency feature for one account
First, we arrange the data according to the timestamp at which the transfers were booked. For this, we can use the "arrange" function from the "dplyr" package.6. Frequency feature for one account
We start with creating a frequency feature for only Alice's account. Once we know how to do this, we can extend it to multiple accounts.7. Frequency feature for one account
For her first transfer, Alice used authentication method 3. Since it's her first transfer, its frequency is zero.8. Frequency feature for one account (step 1)
For her second transfer, she decided to use authentication method 3 again. She used this method once before, so its frequency is one. The function frequency_fun calculates this frequency. It needs two inputs: the number of previous transfers and their corresponding authentication methods. The function counts how many transfers have been made before that used the same authentication method as the latest transfer.9. Frequency feature for one account (step 1)
For her third transfer, she used authentication method 3 once again. She used the same authentication method twice before, so its frequency is two.10. Frequency feature for one account (step 1)
For her next transfer, she used authentication method 1. Since she never used this method before, its frequency is zero.11. Frequency feature for one account (step 1)
Now let's automate this process in R!12. Frequency feature for one account (step 2)
We use the function "rollapply" from the "zoo" package on the transfer_id column. That way the function frequency_fun is applied on each transfer consecutively. Make sure to specify parameter "width" as a list that starts with -1 until minus the length of the transfer_id column. Also set parameter "partial" to TRUE.13. Frequency feature for one account (step 2 & 3)
Lastly, the "rollapply" function will skip the first transfer. Therefore, we add a leading zero to the feature.14. Result!
This is the result! Notice that the fraudulent transfer has a frequency of zero meaning that the authentication method has never been used before.15. For multiple accounts
Now we calculate the frequency feature for all accounts at once. First, group the data according to "account_name" with function "group_by". Then add the feature to the dataset with the function "mutate".16. Result for multiple accounts
Notice that both frauds have a frequency of zero. So frequency features can help detecting anomalies.17. Let's practice!
Now it's your turn to create frequency features.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.