Tuning bagging hyperparameters
While you can easily build a bagging classifier using the default parameters, it is highly recommended that you tune these in order to achieve optimal performance. Ideally, these should be optimized using K-fold cross-validation.
In this exercise, let's see if we can improve model performance by modifying the parameters of the bagging classifier.
Here we are also passing the parameter solver='liblinear'
to LogisticRegression
to reduce the computation time.
This is a part of the course
“Ensemble Methods in Python”
Exercise instructions
- Build a bagging classifier using as base the logistic regression, with
20
base estimators,10
maximum features,0.65
(65%) maximum samples (max_samples
), and sample without replacement. - Use
clf_bag
to predict the labels of the test set,X_test
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build a balanced logistic regression
clf_base = LogisticRegression(class_weight='balanced', solver='liblinear', random_state=42)
# Build and fit a bagging classifier with custom parameters
clf_bag = ____(____, ____, ____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)
# Calculate predictions and evaluate the accuracy on the test set
y_pred = ____
print('Accuracy: {:.2f}'.format(accuracy_score(y_test, y_pred)))
# Print the classification report
print(classification_report(y_test, y_pred))
This exercise is part of the course
Ensemble Methods in Python
Learn how to build advanced and effective machine learning models in Python using ensemble techniques such as bagging, boosting, and stacking.
Bagging is the ensemble method behind powerful machine learning algorithms such as random forests. In this chapter you'll learn the theory behind this technique and build your own bagging models using scikit-learn.
Exercise 1: The strength of “weak” modelsExercise 2: Restricted and unrestricted decision treesExercise 3: "Weak" decision treeExercise 4: Bootstrap aggregatingExercise 5: Training with bootstrappingExercise 6: A first attempt at baggingExercise 7: BaggingClassifier: nuts and boltsExercise 8: Bagging: the scikit-learn wayExercise 9: Checking the out-of-bag scoreExercise 10: Bagging parameters: tips and tricksExercise 11: Exploring the UCI SECOM dataExercise 12: A more complex bagging modelExercise 13: Tuning bagging hyperparametersWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.