1. The intuition behind tree-based methods
In this chapter, you will learn about ensembles of decision trees, a powerful machine learning method for both regression and classification. In the first lesson, you will review the intuition behind a decision tree.
2. Example: Predict animal intelligence from Gestation Time and Litter Size
As an example, here we have data about the litter size and gestation time in days of various mammals. We also have a measure of each species' intelligence, scaled so that human intelligence is 1.
We want to predict intelligence from a species' average litter size and gestation time. We'll use a decision tree.
3. Decision Trees
Decision trees learn rules of the form "if a and b and c, then y". Trees can express non-linear concepts like intervals and non-monotonic relationships, and because AND is similar to multiplication, trees can express certain kinds of non-additive interactions.
4. Decision Trees
A tree model for intelligence expresses that species with average litter size less than 1.15 and gestation greater than 268 days have average intelligence of 0.315, and species with litter sizes between 1.15 and 4.3 have average intelligence of 0.131.
5. Decision Trees
Trees have an expressive and easy to understand concept space. In this case, a tree fits the intelligence data better than a linear model, in the sense that the predictions have a lower root mean squared error.
6. Decision Trees
But trees also give only coarse-grained predictions. Our tree model can only predict 6 possible values, while linear models give continuous-valued predictions.
7. It's Hard for Trees to Express Linear Relationships
Since trees express axis-aligned regions, it's also hard for them to express truly linear relationships
8. It's Hard for Trees to Express Linear Relationships
or any relationship that varies quickly and continuously.
9. Other Issues with Trees
You can try to make finer-grain predictions by doing more splits on the data to build deeper trees, but a deep tree can be overly complex, potentially overfitting the training data. Shallower trees are less likely to overfit, but the predictions are often too coarse-grained.
10. Ensembles of Trees
An ensemble, or a model made up of several trees, will give finer-grained predictions
11. Ensembles of Trees
and usually better quality models than a single tree. In this chapter, we will cover two ensemble methods: random forests and gradient boosted trees.
12. Let's practice!
Now let's do a quick tree exercise.