Classification tree Learning

1. Classification-Tree Learning

Welcome back! In this video, you'll examine how a classification-tree learns from data.

2. Building Blocks of a Decision-Tree

Let's first start by defining some terms. A decision-tree is a data-structure consisting of a hierarchy of individual units called nodes. A node is a point that involves either a question or a prediction.

3. Building Blocks of a Decision-Tree

The root is the node at which the decision-tree starts growing. It has no parent node and involves a question that gives rise to 2 children nodes through two branches. An internal node is a node that has a parent. It also involves a question that gives rise to 2 children nodes. Finally, a node that has no children is called a leaf. A leaf has one parent node and involves no questions. It's where a prediction is made. Recall that when a classification tree is trained on a labeled dataset, the tree learns patterns from the features in such a way to produce the purest leafs. In other words the tree is trained in such a way so that, in each leaf, one class-label is predominant.

4. Prediction

In the tree diagram shown here, consider the case where an instance traverses the tree to reach the leaf on the left. In this leaf, there are 257 instances classified as benign and 7 instances classified as malignant. As a result, the tree's prediction for this instance would be: 'benign'. In order to understand how a classification tree produces the purest leafs possible, let's first define the concept of information gain.

5. Information Gain (IG)

The nodes of a classification tree are grown recursively; in other words, the obtention of an internal node or a leaf depends on the state of its predecessors. To produce the purest leafs possible, at each node, a tree asks a question involving one feature f and a split-point sp. But how does it know which feature and which split-point to pick? It does so by maximizing Information gain! The tree considers that every node contains information and aims at maximizing the Information Gain obtained after each split. Consider the case where a node with N samples is split into a left-node with Nleft samples and a right-node with Nright samples.

6. Information Gain (IG)

The information gain for such split is given by the formula shown here. A question that you may have in mind here is: 'What criterion is used to measure the impurity of a node?' Well, there are different criteria you can use among which are the gini-index and entropy. Now that you know what is Information gain, let's describe how a classification tree learns.

7. Classification-Tree Learning

When an unconstrained tree is trained, the nodes are grown recursively. In other words, a node exists based on the state of its predecessors. At a non-leaf node, the data is split based on feature f and split-point sp in such a way to maximize information gain. If the information gain obtained by splitting a node is null, the node is declared a leaf. Keep in mind that these rules are for unconstrained trees. If you constrain the maximum depth of a tree to 2 for example, all nodes having a depth of 2 will be declared leafs even if the information gain obtained by splitting such nodes is not null.

8. Information Criterion in scikit-learn (Breast Cancer dataset)

Revisiting the 2D breast-cancer dataset from the previous lesson, you can set the information criterion of dt to the gini-index by setting the criterion parameter to 'gini' as shown on the last line here.

9. Information Criterion in scikit-learn

Now fit dt to the training set and predict the test set labels. Then determine dt's test set accuracy which evaluates to about 92%.

10. Let's practice!

Now it's your turn to practice.

This exercise is part of the course

Machine Learning with Tree-Based Models in Python

IntermediateSkill Level

4.9+

Start Course for Free

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.

Exercise 1: Decision tree for classification Exercise 2: Train your first classification tree Exercise 3: Evaluate the classification tree Exercise 4: Logistic regression vs classification tree Exercise 5: Classification tree Learning

Current Exercise

Exercise 6: Growing a classification tree Exercise 7: Using entropy as a criterion Exercise 8: Entropy vs Gini index Exercise 9: Decision tree for regression Exercise 10: Train your first regression tree Exercise 11: Evaluate the regression tree Exercise 12: Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Exercise 1: Generalization Error Exercise 2: Complexity, bias and variance Exercise 3: Overfitting and underfitting Exercise 4: Diagnose bias and variance problems Exercise 5: Instantiate the model Exercise 6: Evaluate the 10-fold CV error Exercise 7: Evaluate the training error Exercise 8: High bias or high variance?Exercise 9: Ensemble Learning Exercise 10: Define the ensemble Exercise 11: Evaluate individual classifiers Exercise 12: Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

Exercise 1: Bagging Exercise 2: Define the bagging classifier Exercise 3: Evaluate Bagging performance Exercise 4: Out of Bag Evaluation Exercise 5: Prepare the ground Exercise 6: OOB Score vs Test Set Score Exercise 7: Random Forests (RF)Exercise 8: Train an RF regressor Exercise 9: Evaluate the RF regressor Exercise 10: Visualizing features importances

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

Exercise 1: Adaboost Exercise 2: Define the AdaBoost classifier Exercise 3: Train the AdaBoost classifier Exercise 4: Evaluate the AdaBoost classifier Exercise 5: Gradient Boosting (GB)Exercise 6: Define the GB regressor Exercise 7: Train the GB regressor Exercise 8: Evaluate the GB regressor Exercise 9: Stochastic Gradient Boosting (SGB)Exercise 10: Regression with SGB Exercise 11: Train the SGB regressor Exercise 12: Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Exercise 1: Tuning a CART's Hyperparameters Exercise 2: Tree hyperparameters Exercise 3: Set the tree's hyperparameter grid Exercise 4: Search for the optimal tree Exercise 5: Evaluate the optimal tree Exercise 6: Tuning a RF's Hyperparameters Exercise 7: Random forests hyperparameters Exercise 8: Set the hyperparameter grid of RF Exercise 9: Search for the optimal forest Exercise 10: Evaluate the optimal forest Exercise 11: Congratulations!