Building a random forest model
You'll again work on the Pima Indians dataset to predict whether an individual has diabetes. This time using a random forest classifier. You'll fit the model on the training data after performing the train-test split and consult the feature importance values.
The feature and target datasets have been pre-loaded for you as X
and y
. Same goes for the necessary packages and functions.
This exercise is part of the course
Dimensionality Reduction in Python
Exercise instructions
- Set a 25% test size to perform a 75%-25% train-test split.
- Fit the random forest classifier to the training data.
- Calculate the accuracy on the test set.
- Print the feature importances per feature.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform a 75% training and 25% test data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=____, random_state=0)
# Fit the random forest model to the training data
rf = RandomForestClassifier(random_state=0)
rf.____(____, ____)
# Calculate the accuracy
acc = accuracy_score(____, ____)
# Print the importances per feature
print(dict(zip(X.columns, rf.____.round(2))))
# Print accuracy
print(f"{acc:.1%} accuracy on test set.")