Random forests
Random Forests are a classic and powerful ensemble method that utilize individual decision trees via bootstrap aggregation (or bagging for short). Two main hyperparameters involved in this type of model are the number of trees, and the max depth of each tree. In this exercise, you will implement and evaluate a simple random forest classifier with some fixed hyperparameter values.
X_train
, y_train
, X_test
, y_test
are available in your workspace. pandas
as pd
, numpy
as np
, and sklearn
are also available in your workspace. RandomForestClassifier()
from sklearn.ensemble
is available as well, along with roc_curve()
and auc()
from sklearn.metrics
.
This exercise is part of the course
Predicting CTR with Machine Learning in Python
Exercise instructions
- Create a random forest classifier with 50 trees, and a max depth of 5.
- Train the classifier and get probability scores via
.predict_proba()
, and predictions via.predict()
for the testing data. - Evaluate the AUC of the ROC curve for the classifier using first
roc_curve()
to calculatefpr
andtpr
, and thenauc()
on the result. - Evaluate the precision and recall for the classifier.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create random forest classifier with specified params
clf = ____(____ = 50, ____ = 5)
# Train classifier - predict probability score and label
y_score = clf.____(X_train, y_train).____(X_test)
y_pred = clf.____(X_train, y_train).____(X_test)
# Get ROC curve metrics
fpr, tpr, thresholds = ____(y_test, y_score[:, 1])
print("ROC of AUC: %s"%(____(fpr, tpr)))
# Get precision and recall
precision = ____(y_test, y_pred, average = 'weighted')
recall = ____(y_test, y_pred, average = 'weighted')
print("Precision: %s, Recall: %s" %(precision, recall))