A more complex bagging model
Having explored the semi-conductor data, let's now build a bagging classifier to predict the 'Pass/Fail' label given the input features.
The preprocessed dataset is available in your workspace as uci_secom, and training and test sets have been created for you.
As the target has a high class imbalance, use a "balanced" logistic regression as the base estimator here. 
We will also reduce the computation time for LogisticRegression with the parameter solver='liblinear', which is a faster optimizer than the default.
Cet exercice fait partie du cours
Ensemble Methods in Python
Instructions
- Instantiate a logistic regression to use as the base classifier with the parameters: class_weight='balanced',solver='liblinear', andrandom_state=42.
- Build a bagging classifier using the logistic regression as the base estimator, specifying the maximum number of features as 10, and including the out-of-bag score.
- Print the out-of-bag score to compare to the accuracy.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Build a balanced logistic regression
clf_lr = ____
# Build and fit a bagging classifier
clf_bag = ____(____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)
# Evaluate the accuracy on the test set and show the out-of-bag score
pred = clf_bag.predict(X_test)
print('Accuracy:  {:.2f}'.format(accuracy_score(y_test, pred)))
print('OOB-Score: {:.2f}'.format(____))
# Print the confusion matrix
print(confusion_matrix(y_test, pred))