Decision trees
Your task in this exercise is to make a simple decision tree using scikit-learn's DecisionTreeClassifier
on the breast cancer
dataset that comes pre-loaded with scikit-learn.
This dataset contains numeric measurements of various dimensions of individual tumors (such as perimeter and texture) from breast biopsies and a single outcome value (the tumor is either malignant, or benign).
We've preloaded the dataset of samples (measurements) into X
and the target values per tumor into y
. Now, you have to split the complete dataset into training and testing sets, and then train a DecisionTreeClassifier
. You'll specify a parameter called max_depth
. Many other parameters can be modified within this model, and you can check all of them out here.
Cet exercice fait partie du cours
Extreme Gradient Boosting with XGBoost
Instructions
- Import:
train_test_split
fromsklearn.model_selection
.DecisionTreeClassifier
fromsklearn.tree
.
- Create training and test sets such that 20% of the data is used for testing. Use a
random_state
of123
. - Instantiate a
DecisionTreeClassifier
calleddt_clf_4
with amax_depth
of4
. This parameter specifies the maximum number of successive split points you can have before reaching a leaf node. - Fit the classifier to the training set and predict the labels of the test set.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import the necessary modules
____
____
# Create the training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=____)
# Instantiate the classifier: dt_clf_4
dt_clf_4 = ____
# Fit the classifier to the training set
____
# Predict the labels of the test set: y_pred_4
y_pred_4 = ____
# Compute the accuracy of the predictions: accuracy
accuracy = float(np.sum(y_pred_4==y_test))/y_test.shape[0]
print("accuracy:", accuracy)