What is a decision tree?

1. What is a decision tree?

Because XGBoost is usually used with trees as base learners, we need to understand what an individual decision tree is, and how it works.

2. Visualizing a decision tree

Here is an example decision tree. As you can see, it has a single question that is being asked at each decision node, and only 2 possible choices, at the very bottom of each decision tree, there is a single possible decision. In this example decision tree for whether to purchase a vehicle, the first question you ask is whether it has been road-tested. If it hasn't, you immediately decide not to buy, otherwise, you continue asking questions, such as what the vehicle's mileage is, and, if its age is old or recent. At bottom, every possible decision will eventually lead to a choice, some taking many fewer questions to get to those choices than others.

3. Decision trees as base learners

The concept of a base learner will be covered more extensively later, but for now, just think of any individual learning algorithm in an ensemble algorithm as a base learner. This is important because XGBoost itself is an ensemble learning method in that it uses the outputs of many models for a final prediction. Anyway, as you saw in the previous slide, a decision tree is a learning method that involves a tree-like graph to model either a continuous or categorical choice given some data. It is composed of a series of binary decisions (yes/no or true/false questions) that when answered in succession ultimately yield a prediction about the data at hand (these predictions happen at the leaves of the tree).

4. Decision trees and CART

Decision trees are constructed iteratively (that is, one binary decision at a time) until some stopping criterion is met (the depth of the tree reaches some pre-defined maximum value, for example). During construction, the tree is built one split at a time, and the way that a split is selected (that is, what feature to split on and where in the feature's range of values to split) can vary, but involves choosing a split point that segregates the target values better (puts each target category into buckets that are increasingly dominated by just one category) until all (or nearly all) values within a given split are exclusively of one category or another. Using this process, each leaf of the decision tree will have a single category in the majority, or should be exclusively of one category.

5. Individual decision trees tend to overfit

Individual decision trees in general are low-bias, high-variance learning models.

6. Individual decision trees tend to overfit

That is, they are very good at learning relationships within any data you train them on, but they tend to overfit the data you use to train them on and usually generalize to new data poorly. XGBoost uses a slightly different kind of a decision tree,

7. CART: Classification and Regression Trees

called a classification and regression tree, or CART. Whereas for the decision trees described above the leaf nodes always contain decision values, CART trees contain a real-valued score in each leaf, regardless of whether they are used for classification or regression. The real-valued scores can then be thresholded to convert into categories for classification problems if necessary.

8. Let's work with some decision trees!

Awesome, let's get to working with some decision trees!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.