Get Started

Tuning bagging hyperparameters

While you can easily build a bagging classifier using the default parameters, it is highly recommended that you tune these in order to achieve optimal performance. Ideally, these should be optimized using K-fold cross-validation.

In this exercise, let's see if we can improve model performance by modifying the parameters of the bagging classifier.

Here we are also passing the parameter solver='liblinear' to LogisticRegression to reduce the computation time.

This is a part of the course

“Ensemble Methods in Python”

View Course

Exercise instructions

  • Build a bagging classifier using as base the logistic regression, with 20 base estimators, 10 maximum features, 0.65 (65%) maximum samples (max_samples), and sample without replacement.
  • Use clf_bag to predict the labels of the test set, X_test.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build a balanced logistic regression
clf_base = LogisticRegression(class_weight='balanced', solver='liblinear', random_state=42)

# Build and fit a bagging classifier with custom parameters
clf_bag = ____(____, ____, ____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)

# Calculate predictions and evaluate the accuracy on the test set
y_pred = ____
print('Accuracy:  {:.2f}'.format(accuracy_score(y_test, y_pred)))

# Print the classification report
print(classification_report(y_test, y_pred))

This exercise is part of the course

Ensemble Methods in Python

AdvancedSkill Level
4.8+
4 reviews

Learn how to build advanced and effective machine learning models in Python using ensemble techniques such as bagging, boosting, and stacking.

Bagging is the ensemble method behind powerful machine learning algorithms such as random forests. In this chapter you'll learn the theory behind this technique and build your own bagging models using scikit-learn.

Exercise 1: The strength of “weak” modelsExercise 2: Restricted and unrestricted decision treesExercise 3: "Weak" decision treeExercise 4: Bootstrap aggregatingExercise 5: Training with bootstrappingExercise 6: A first attempt at baggingExercise 7: BaggingClassifier: nuts and boltsExercise 8: Bagging: the scikit-learn wayExercise 9: Checking the out-of-bag scoreExercise 10: Bagging parameters: tips and tricksExercise 11: Exploring the UCI SECOM dataExercise 12: A more complex bagging modelExercise 13: Tuning bagging hyperparameters

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free