Model adjustments
A simple way to adjust the random forest model to deal with highly imbalanced fraud data, is to use the class_weights
option when defining your sklearn
model. However, as you will see, it is a bit of a blunt force mechanism and might not work for your very special case.
In this exercise you'll explore the weight = "balanced_subsample"
mode the Random Forest model from the earlier exercise. You already have split your data in a training and test set, i.e X_train
, X_test
, y_train
, y_test
are available. The metrics function have already been imported.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Set the
class_weight
argument of your classifier tobalanced_subsample
. - Fit your model to your training set.
- Obtain predictions and probabilities from
X_test
. - Obtain the
roc_auc_score
, the classification report and confusion matrix.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define the model with balanced subsample
model = RandomForestClassifier(class_weight='____', random_state=5)
# Fit your training model to your training set
model.fit(____, ____)
# Obtain the predicted values and probabilities from the model
predicted = ____.____(____)
probs = ____.____(____)
# Print the roc_auc_score, the classification report and confusion matrix
print(____(____, ____))
print(____(____, ____))
print(____(____, ____))