Your first pipeline
Your colleague has used AdaBoostClassifier
for the credit scoring dataset. You want to also try out a random forest classifier. In this exercise, you will fit this classifier to the data and compare it to AdaBoostClassifier
. Make sure to use train/test data splitting to avoid overfitting. The data is preloaded and transformed so that all features are numeric. The features are available as X
and the labels as y
. The module RandomForestClassifier
has also been preloaded.
This exercise is part of the course
Designing Machine Learning Workflows in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split the data into train and test, with 20% as test
X_train, ____, ____, y_test = train_test_split(
X, y, ____=0.2, random_state=1)