Train your first classification tree
In this exercise you'll work with the Wisconsin Breast Cancer Dataset from the UCI machine learning repository. You'll predict whether a tumor is malignant or benign based on two features: the mean radius of the tumor (radius_mean
) and its mean number of concave points (concave points_mean
).
The dataset is already loaded in your workspace and is split into 80% train and 20% test. The feature matrices are assigned to X_train
and X_test
, while the arrays of labels are assigned to y_train
and y_test
where class 1 corresponds to a malignant tumor and class 0 corresponds to a benign tumor. To obtain reproducible results, we also defined a variable called SEED
which is set to 1.
This exercise is part of the course
Machine Learning with Tree-Based Models in Python
Exercise instructions
Import
DecisionTreeClassifier
fromsklearn.tree
.Instantiate a
DecisionTreeClassifier
dt
of maximum depth equal to 6.Fit
dt
to the training set.Predict the test set labels and assign the result to
y_pred
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import DecisionTreeClassifier from sklearn.tree
from ____.____ import ____
# Instantiate a DecisionTreeClassifier 'dt' with a maximum depth of 6
dt = ____(____=____, random_state=SEED)
# Fit dt to the training set
____.____(____, ____)
# Predict test set labels
y_pred = ____.____(____)
print(y_pred[0:5])