Get startedGet started for free

Building a random forest model

You'll again work on the Pima Indians dataset to predict whether an individual has diabetes. This time using a random forest classifier. You'll fit the model on the training data after performing the train-test split and consult the feature importance values.

The feature and target datasets have been pre-loaded for you as X and y. Same goes for the necessary packages and functions.

This exercise is part of the course

Dimensionality Reduction in Python

View Course

Exercise instructions

  • Set a 25% test size to perform a 75%-25% train-test split.
  • Fit the random forest classifier to the training data.
  • Calculate the accuracy on the test set.
  • Print the feature importances per feature.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Perform a 75% training and 25% test data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=____, random_state=0)

# Fit the random forest model to the training data
rf = RandomForestClassifier(random_state=0)
rf.____(____, ____)

# Calculate the accuracy
acc = accuracy_score(____, ____)

# Print the importances per feature
print(dict(zip(X.columns, rf.____.round(2))))

# Print accuracy
print(f"{acc:.1%} accuracy on test set.") 
Edit and Run Code