Adjusting your Random Forest to fraud detection
In this exercise you're going to dive into the options for the random forest classifier, as we'll assign weights and tweak the shape of the decision trees in the forest. You'll define weights manually, to be able to off-set that imbalance slightly. In our case we have 300 fraud to 7000 non-fraud cases, so by setting the weight ratio to 1:12, we get to a 1/3 fraud to 2/3 non-fraud ratio, which is good enough for training the model on.
The data in this exercise has already been split into training and test set, so you just need to focus on defining your model. You can then use the function get_model_results()
as a short cut. This function fits the model to your training data, predicts and obtains performance metrics similar to the steps you did in the previous exercises.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Change the
weight
option to set the ratio to 1 to 12 for the non-fraud and fraud cases, and set the split criterion to 'entropy'. - Set the maximum depth to 10.
- Set the minimal samples in leaf nodes to 10.
- Set the number of trees to use in the model to 20.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Change the model options
model = RandomForestClassifier(bootstrap=True, class_weight={0:____, 1:____}, criterion='____',
# Change depth of model
max_depth=____,
# Change the number of samples in leaf nodes
min_samples_leaf=____,
# Change the number of trees to use
n_estimators=____, n_jobs=-1, random_state=5)
# Run the function get_model_results
get_model_results(X_train, y_train, X_test, y_test, model)