Random Forest Classifier - part 1
Let's now create a first random forest classifier for fraud detection. Hopefully you can do better than the baseline accuracy you've just calculated, which was roughly 96%. This model will serve as the "baseline" model that you're going to try to improve in the upcoming exercises. Let's start first with splitting the data into a test and training set, and defining the Random Forest model. The data available are features X
and labels y
.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Import the random forest classifier from
sklearn
. - Split your features
X
and labelsy
into a training and test set. Set aside a test set of 30%. - Assign the random forest classifier to
model
and keeprandom_state
at 5. We need to set a random state here in order to be able to compare results across different models.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the random forest model from sklearn
from sklearn.ensemble import ____
# Split your data into training and test set
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=0)
# Define the model as the random forest
model = ____(random_state=5)