A more complex bagging model
Having explored the semi-conductor data, let's now build a bagging classifier to predict the 'Pass/Fail'
label given the input features.
The preprocessed dataset is available in your workspace as uci_secom
, and training and test sets have been created for you.
As the target has a high class imbalance, use a "balanced"
logistic regression as the base estimator here.
We will also reduce the computation time for LogisticRegression
with the parameter solver='liblinear'
, which is a faster optimizer than the default.
This exercise is part of the course
Ensemble Methods in Python
Exercise instructions
- Instantiate a logistic regression to use as the base classifier with the parameters:
class_weight='balanced'
,solver='liblinear'
, andrandom_state=42
. - Build a bagging classifier using the logistic regression as the base estimator, specifying the maximum number of features as
10
, and including the out-of-bag score. - Print the out-of-bag score to compare to the accuracy.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build a balanced logistic regression
clf_lr = ____
# Build and fit a bagging classifier
clf_bag = ____(____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)
# Evaluate the accuracy on the test set and show the out-of-bag score
pred = clf_bag.predict(X_test)
print('Accuracy: {:.2f}'.format(accuracy_score(y_test, pred)))
print('OOB-Score: {:.2f}'.format(____))
# Print the confusion matrix
print(confusion_matrix(y_test, pred))