Stochastic Gradient Boosting (SGB)

1. Stochastic Gradient Boosting (SGB)

2. Gradient Boosting: Cons

Gradient boosting involves an exhaustive search procedure. Each tree in the ensemble is trained to find the best split-points and the best features. This procedure may lead to CARTs that use the same split-points and possibly the same features.

3. Stochastic Gradient Boosting

To mitigate these effects, you can use an algorithm known as stochastic gradient boosting. In stochastic gradient boosting, each CART is trained on a random subset of the training data. This subset is sampled without replacement. Furthermore, at the level of each node, features are sampled without replacement when choosing the best split-points. As a result, this creates further diversity in the ensemble and the net effect is adding more variance to the ensemble of trees.

4. Stochastic Gradient Boosting: Training

Let's take a closer look at the training procedure used in stochastic gradient boosting by examining the diagram shown on this slide. First, instead of providing all the training instances to a tree, only a fraction of these instances are provided through sampling without replacement. The sampled data is then used for training a tree. However, not all features are considered when a split is made. Instead, only a certain randomly sampled fraction of these features are used for this purpose. Once a tree is trained, predictions are made and the residual errors can be computed. These residual errors are multiplied by the learning rate eta and are fed to the next tree in the ensemble. This procedure is repeated sequentially until all the trees in the ensemble are trained. The prediction procedure for a new instance in stochastic gradient boosting is similar to that of gradient boosting.

5. Stochastic Gradient Boosting in sklearn (auto dataset)

Alright, now it's time to put this into practice. As in the last video, we'll be dealing with the auto-dataset which is already loaded. Perform the same imports that were introduced in the previous lesson and split the data.

6. Stochastic Gradient Boosting in sklearn (auto dataset)

Now define a stochastic-gradient-boosting-regressor named sgbt consisting of 300 decision-stumps. This can be done by setting the parameters max_depth to 1 and n_estimators to 300. Here, the parameter subsample was set to 0-dot-8 in order for each tree to sample 80% of the data for training. Finally, the parameter max_features was set to 0-dot-2 so that each tree uses 20% of available features to perform the best-split. Once done, fit sgbt to the training set and predict the test set labels.

7. Stochastic Gradient Boosting in sklearn (auto dataset)

Finally, compute the test set RMSE and print it. The result shows that sgbt achieves a test set RMSE of 3-dot-95.

8. Let's practice!

Now let's try some examples.

This exercise is part of the course

Machine Learning with Tree-Based Models in Python

IntermediateSkill Level

4.9+

Start Course for Free

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.

Exercise 1: Decision tree for classification Exercise 2: Train your first classification tree Exercise 3: Evaluate the classification tree Exercise 4: Logistic regression vs classification tree Exercise 5: Classification tree Learning Exercise 6: Growing a classification tree Exercise 7: Using entropy as a criterion Exercise 8: Entropy vs Gini index Exercise 9: Decision tree for regression Exercise 10: Train your first regression tree Exercise 11: Evaluate the regression tree Exercise 12: Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Exercise 1: Generalization Error Exercise 2: Complexity, bias and variance Exercise 3: Overfitting and underfitting Exercise 4: Diagnose bias and variance problems Exercise 5: Instantiate the model Exercise 6: Evaluate the 10-fold CV error Exercise 7: Evaluate the training error Exercise 8: High bias or high variance?Exercise 9: Ensemble Learning Exercise 10: Define the ensemble Exercise 11: Evaluate individual classifiers Exercise 12: Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

Exercise 1: Bagging Exercise 2: Define the bagging classifier Exercise 3: Evaluate Bagging performance Exercise 4: Out of Bag Evaluation Exercise 5: Prepare the ground Exercise 6: OOB Score vs Test Set Score Exercise 7: Random Forests (RF)Exercise 8: Train an RF regressor Exercise 9: Evaluate the RF regressor Exercise 10: Visualizing features importances

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

Exercise 1: Adaboost Exercise 2: Define the AdaBoost classifier Exercise 3: Train the AdaBoost classifier Exercise 4: Evaluate the AdaBoost classifier Exercise 5: Gradient Boosting (GB)Exercise 6: Define the GB regressor Exercise 7: Train the GB regressor Exercise 8: Evaluate the GB regressor Exercise 9: Stochastic Gradient Boosting (SGB)

Current Exercise

Exercise 10: Regression with SGB Exercise 11: Train the SGB regressor Exercise 12: Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Exercise 1: Tuning a CART's Hyperparameters Exercise 2: Tree hyperparameters Exercise 3: Set the tree's hyperparameter grid Exercise 4: Search for the optimal tree Exercise 5: Evaluate the optimal tree Exercise 6: Tuning a RF's Hyperparameters Exercise 7: Random forests hyperparameters Exercise 8: Set the hyperparameter grid of RF Exercise 9: Search for the optimal forest Exercise 10: Evaluate the optimal forest Exercise 11: Congratulations!