Get startedGet started for free

A more complex bagging model

Having explored the semi-conductor data, let's now build a bagging classifier to predict the 'Pass/Fail' label given the input features.

The preprocessed dataset is available in your workspace as uci_secom, and training and test sets have been created for you.

As the target has a high class imbalance, use a "balanced" logistic regression as the base estimator here.

We will also reduce the computation time for LogisticRegression with the parameter solver='liblinear', which is a faster optimizer than the default.

This exercise is part of the course

Ensemble Methods in Python

View Course

Exercise instructions

  • Instantiate a logistic regression to use as the base classifier with the parameters: class_weight='balanced', solver='liblinear', and random_state=42.
  • Build a bagging classifier using the logistic regression as the base estimator, specifying the maximum number of features as 10, and including the out-of-bag score.
  • Print the out-of-bag score to compare to the accuracy.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build a balanced logistic regression
clf_lr = ____

# Build and fit a bagging classifier
clf_bag = ____(____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)

# Evaluate the accuracy on the test set and show the out-of-bag score
pred = clf_bag.predict(X_test)
print('Accuracy:  {:.2f}'.format(accuracy_score(y_test, pred)))
print('OOB-Score: {:.2f}'.format(____))

# Print the confusion matrix
print(confusion_matrix(y_test, pred))
Edit and Run Code