Get startedGet started for free

Model adjustments

A simple way to adjust the random forest model to deal with highly imbalanced fraud data, is to use the class_weights option when defining your sklearn model. However, as you will see, it is a bit of a blunt force mechanism and might not work for your very special case.

In this exercise you'll explore the weight = "balanced_subsample" mode the Random Forest model from the earlier exercise. You already have split your data in a training and test set, i.e X_train, X_test, y_train, y_test are available. The metrics function have already been imported.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Set the class_weight argument of your classifier to balanced_subsample.
  • Fit your model to your training set.
  • Obtain predictions and probabilities from X_test.
  • Obtain the roc_auc_score, the classification report and confusion matrix.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define the model with balanced subsample
model = RandomForestClassifier(class_weight='____', random_state=5)

# Fit your training model to your training set
model.fit(____, ____)

# Obtain the predicted values and probabilities from the model 
predicted = ____.____(____)
probs = ____.____(____)

# Print the roc_auc_score, the classification report and confusion matrix
print(____(____, ____))
print(____(____, ____))
print(____(____, ____))
Edit and Run Code