Get startedGet started for free

Adjusting your Random Forest to fraud detection

In this exercise you're going to dive into the options for the random forest classifier, as we'll assign weights and tweak the shape of the decision trees in the forest. You'll define weights manually, to be able to off-set that imbalance slightly. In our case we have 300 fraud to 7000 non-fraud cases, so by setting the weight ratio to 1:12, we get to a 1/3 fraud to 2/3 non-fraud ratio, which is good enough for training the model on.

The data in this exercise has already been split into training and test set, so you just need to focus on defining your model. You can then use the function get_model_results() as a short cut. This function fits the model to your training data, predicts and obtains performance metrics similar to the steps you did in the previous exercises.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Change the weight option to set the ratio to 1 to 12 for the non-fraud and fraud cases, and set the split criterion to 'entropy'.
  • Set the maximum depth to 10.
  • Set the minimal samples in leaf nodes to 10.
  • Set the number of trees to use in the model to 20.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Change the model options
model = RandomForestClassifier(bootstrap=True, class_weight={0:____, 1:____}, criterion='____',
			
			# Change depth of model
            max_depth=____,
		
			# Change the number of samples in leaf nodes
            min_samples_leaf=____, 

			# Change the number of trees to use
            n_estimators=____, n_jobs=-1, random_state=5)

# Run the function get_model_results
get_model_results(X_train, y_train, X_test, y_test, model)
Edit and Run Code