Adjusting your Random Forest to fraud detection

In this exercise you're going to dive into the options for the random forest classifier, as we'll assign weights and tweak the shape of the decision trees in the forest. You'll define weights manually, to be able to off-set that imbalance slightly. In our case we have 300 fraud to 7000 non-fraud cases, so by setting the weight ratio to 1:12, we get to a 1/3 fraud to 2/3 non-fraud ratio, which is good enough for training the model on.

The data in this exercise has already been split into training and test set, so you just need to focus on defining your model. You can then use the function get_model_results() as a short cut. This function fits the model to your training data, predicts and obtains performance metrics similar to the steps you did in the previous exercises.

Change the weight option to set the ratio to 1 to 12 for the non-fraud and fraud cases, and set the split criterion to 'entropy'.
Set the maximum depth to 10.
Set the minimal samples in leaf nodes to 10.
Set the number of trees to use in the model to 20.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercise

Adjusting your Random Forest to fraud detection

Instructions