Decision tree for regression

1. Decision-Tree for Regression

Welcome back! In this video, you'll learn how to train a decision tree for a regression problem. Recall that in regression, the target variable is continuous. In other words, the output of your model is a real value.

2. Auto-mpg Dataset

Let's motivate our discussion of regression by introducing the automobile miles-per-gallon dataset from the UCI Machine Learning Repository. This dataset consists of 6 features corresponding to the characteristics of a car and a continuous target variable labeled mpg which stands for miles-per-gallon. Our task is to predict the mpg consumption of a car given these six features. To simplify the problem, here the analysis is restricted to only one feature corresponding to the displacement of a car. This feature is denoted by displ.

3. Auto-mpg with one feature

A 2D scatter plot of mpg versus displ shows that the mpg-consumption decreases nonlinearly with displacement. Note that linear models such as linear regression would not be able to capture such a non-linear trend. Let's see how you can train a decision tree with scikit-learn to solve this regression problem.

4. Regression-Tree in scikit-learn

Note that the features X and the labels y are already loaded in the environment. First, import DecisionTreeRegressor from sklearn-dot-tree and the functions train_test_split() from sklearn-dot-model_selection and mean_squared_error as MSE() from sklearn-dot-metrics. Then, split the data into 80%-train and 20%-test using train_test_split. You can now instantiate the DecisionTreeRegressor() with a maximum depth of 4 by setting the parameter max_depth to 4. In addition, set the parameter min_sample_leaf to 0-dot-1 to impose a stopping condition in which each leaf has to contain at least 10% of the training data.

5. Regression-Tree in scikit-learn

Now fit dt to the training set and predict the test set labels. To obtain the root-mean-squared-error of your model on the test-set; proceed as follows: - first, evaluate the mean-squared error, - then, raise the obtained value to the power 1/2. Finally, print dt's test set rmse to obtain a value of 5-dot-1.

6. Information Criterion for Regression-Tree

Here, it's important to note that, when a regression tree is trained on a dataset, the impurity of a node is measured using the mean-squared error of the targets in that node. This means that the regression tree tries to find the splits that produce leafs where in each leaf the target values are on average, the closest possible to the mean-value of the labels in that particular leaf.

7. Prediction

As a new instance traverses the tree and reaches a certain leaf, its target-variable 'y' is computed as the average of the target-variables contained in that leaf as shown in this formula.

8. Linear Regression vs. Regression-Tree

To highlight the importance of the flexibility of regression trees, take a look at this figure. On the left we have a scatter plot of the data in blue along with the predictions of a linear regression model shown in black. The linear model fails to capture the non-linear trend exhibited by the data. On the right, we have the same scatter plot along with a red line corresponding to the predictions of the regression tree that you trained earlier. The regression tree shows a greater flexibility and is able to capture the non-linearity, though not fully. In the next chapter, you'll aggregate the predictions of a set of trees that are trained differently to obtain better results.

9. Let's practice!

Now it's your turn to practice.

This exercise is part of the course

Machine Learning with Tree-Based Models in Python

IntermediateSkill Level

4.9+

Start Course for Free

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.

Exercise 1: Decision tree for classification Exercise 2: Train your first classification tree Exercise 3: Evaluate the classification tree Exercise 4: Logistic regression vs classification tree Exercise 5: Classification tree Learning Exercise 6: Growing a classification tree Exercise 7: Using entropy as a criterion Exercise 8: Entropy vs Gini index Exercise 9: Decision tree for regression

Current Exercise

Exercise 10: Train your first regression tree Exercise 11: Evaluate the regression tree Exercise 12: Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Exercise 1: Generalization Error Exercise 2: Complexity, bias and variance Exercise 3: Overfitting and underfitting Exercise 4: Diagnose bias and variance problems Exercise 5: Instantiate the model Exercise 6: Evaluate the 10-fold CV error Exercise 7: Evaluate the training error Exercise 8: High bias or high variance?Exercise 9: Ensemble Learning Exercise 10: Define the ensemble Exercise 11: Evaluate individual classifiers Exercise 12: Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

Exercise 1: Bagging Exercise 2: Define the bagging classifier Exercise 3: Evaluate Bagging performance Exercise 4: Out of Bag Evaluation Exercise 5: Prepare the ground Exercise 6: OOB Score vs Test Set Score Exercise 7: Random Forests (RF)Exercise 8: Train an RF regressor Exercise 9: Evaluate the RF regressor Exercise 10: Visualizing features importances

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

Exercise 1: Adaboost Exercise 2: Define the AdaBoost classifier Exercise 3: Train the AdaBoost classifier Exercise 4: Evaluate the AdaBoost classifier Exercise 5: Gradient Boosting (GB)Exercise 6: Define the GB regressor Exercise 7: Train the GB regressor Exercise 8: Evaluate the GB regressor Exercise 9: Stochastic Gradient Boosting (SGB)Exercise 10: Regression with SGB Exercise 11: Train the SGB regressor Exercise 12: Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Exercise 1: Tuning a CART's Hyperparameters Exercise 2: Tree hyperparameters Exercise 3: Set the tree's hyperparameter grid Exercise 4: Search for the optimal tree Exercise 5: Evaluate the optimal tree Exercise 6: Tuning a RF's Hyperparameters Exercise 7: Random forests hyperparameters Exercise 8: Set the hyperparameter grid of RF Exercise 9: Search for the optimal forest Exercise 10: Evaluate the optimal forest Exercise 11: Congratulations!