Een complexer bagging-model

Nu we de halfgeleidergegevens hebben verkend, gaan we een bagging-classifier bouwen om het label 'Pass/Fail' te voorspellen op basis van de invoerkenmerken.

De voorbewerkte gegevensset is beschikbaar in je werkruimte als uci_secom, en trainings- en testsets zijn al voor je gemaakt.

Omdat de target sterk klasse-ongelijk is, gebruik je hier een "balanced" logistic regression als basis-Estimator.

We verkorten ook de rekentijd voor LogisticRegression met de parameter solver='liblinear', een snellere optimizer dan de standaard.

Deze oefening maakt deel uit van de cursus

Ensemblemethoden in Python

Cursus bekijken

Oefeninstructies

Instantier een logistic regression om te gebruiken als basis-classifier met de parameters: class_weight='balanced', solver='liblinear' en random_state=42.
Bouw een bagging-classifier met de logistic regression als basis-Estimator, specificeer het maximumaantal features als 10, en neem de out-of-bag-score op.
Print de out-of-bag-score om te vergelijken met de nauwkeurigheid.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Build a balanced logistic regression
clf_lr = ____

# Build and fit a bagging classifier
clf_bag = ____(____, ____, ____, random_state=500)
clf_bag.fit(X_train, y_train)

# Evaluate the accuracy on the test set and show the out-of-bag score
pred = clf_bag.predict(X_test)
print('Accuracy:  {:.2f}'.format(accuracy_score(y_test, pred)))
print('OOB-Score: {:.2f}'.format(____))

# Print the confusion matrix
print(confusion_matrix(y_test, pred))

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Ensemblemethoden in Python

SkillTag.level.advancedSkillTag.label

4.9+

Begin de cursus gratis

Do you struggle to determine which of the models you built is the best for your problem? You should give up on that, and use them all instead! In this chapter, you'll learn how to combine multiple models into one using "Voting" and "Averaging". You'll use these to predict the ratings of apps on the Google Play Store, whether or not a Pokémon is legendary, and which characters are going to die in Game of Thrones!

Exercise 1: Introduction to ensemble methods Exercise 2: Exploring Google apps data Exercise 3: Predicting the rating of an app Exercise 4: Voting Exercise 5: Choosing the best model Exercise 6: Assembling your first ensemble Exercise 7: Evaluating your ensemble Exercise 8: Averaging Exercise 9: Journey to Westeros Exercise 10: Predicting GoT deaths Exercise 11: Soft vs. hard voting

Bagging is the ensemble method behind powerful machine learning algorithms such as random forests. In this chapter you'll learn the theory behind this technique and build your own bagging models using scikit-learn.

Exercise 1: De kracht van ‘zwakke’ modellen Exercise 2: Beperkte en onbeperkte beslisbomen Exercise 3: "Zwakke" beslisboom Exercise 4: Bootstrap-aggregatie Exercise 5: Trainen met bootstrapping Exercise 6: Een eerste poging tot bagging Exercise 7: BaggingClassifier: de fijne kneepjes Exercise 8: Bagging: de scikit-learn-manier Exercise 9: De out-of-bag-score controleren Exercise 10: Bagging-parameters: tips en tricks Exercise 11: De UCI SECOM-data verkennen Exercise 12: Een complexer bagging-model

Huidige oefening

Exercise 13: Hyperparameters voor bagging afstemmen

Boosting is class of ensemble learning algorithms that includes award-winning models such as AdaBoost. In this chapter, you'll learn about this award-winning model, and use it to predict the revenue of award-winning movies! You'll also learn about gradient boosting algorithms such as CatBoost and XGBoost.

Exercise 1: The effectiveness of gradual learning Exercise 2: Introducing the movie database Exercise 3: Exploring movie features Exercise 4: Predicting movie revenue Exercise 5: Boosting for predicted revenue Exercise 6: Adaptive boosting: award winning model Exercise 7: Your first AdaBoost model Exercise 8: Tree-based AdaBoost regression Exercise 9: Making the most of AdaBoost Exercise 10: Gradient boosting Exercise 11: Revisiting Google app reviews Exercise 12: Sentiment analysis with GBM Exercise 13: Gradient boosting flavors Exercise 14: Movie revenue prediction with CatBoost Exercise 15: Boosting contest: Light vs Extreme

Get ready to see how things stack up! In this final chapter you'll learn about the stacking ensemble method. You'll learn how to implement it using scikit-learn as well as with the mlxtend library! You'll apply stacking to predict the edibility of North American mushrooms, and revisit the ratings of Google apps with this more advanced approach.

Exercise 1: The intuition behind stacking Exercise 2: Exploring the mushroom dataset Exercise 3: Predicting mushroom edibility Exercise 4: K-nearest neighbors for mushrooms Exercise 5: Build your first stacked ensemble Exercise 6: Applying stacking to predict app ratings Exercise 7: Building the stacking classifier Exercise 8: Stacked predictions for app ratings Exercise 9: Let's mlxtend it!Exercise 10: A first attempt with mlxtend Exercise 11: Back to regression with stacking Exercise 12: Mushrooms: a matter of life or death Exercise 13: Ensembling it all together