Get startedGet started for free

Decision trees

Your task in this exercise is to make a simple decision tree using scikit-learn's DecisionTreeClassifier on the breast cancer dataset that comes pre-loaded with scikit-learn.

This dataset contains numeric measurements of various dimensions of individual tumors (such as perimeter and texture) from breast biopsies and a single outcome value (the tumor is either malignant, or benign).

We've preloaded the dataset of samples (measurements) into X and the target values per tumor into y. Now, you have to split the complete dataset into training and testing sets, and then train a DecisionTreeClassifier. You'll specify a parameter called max_depth. Many other parameters can be modified within this model, and you can check all of them out here.

This exercise is part of the course

Extreme Gradient Boosting with XGBoost

View Course

Exercise instructions

  • Import:
    • train_test_split from sklearn.model_selection.
    • DecisionTreeClassifier from sklearn.tree.
  • Create training and test sets such that 20% of the data is used for testing. Use a random_state of 123.
  • Instantiate a DecisionTreeClassifier called dt_clf_4 with a max_depth of 4. This parameter specifies the maximum number of successive split points you can have before reaching a leaf node.
  • Fit the classifier to the training set and predict the labels of the test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the necessary modules
____
____

# Create the training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=____)

# Instantiate the classifier: dt_clf_4
dt_clf_4 = ____

# Fit the classifier to the training set
____

# Predict the labels of the test set: y_pred_4
y_pred_4 = ____

# Compute the accuracy of the predictions: accuracy
accuracy = float(np.sum(y_pred_4==y_test))/y_test.shape[0]
print("accuracy:", accuracy)
Edit and Run Code