Session Ready
Exercise

Decision trees

Your task in this exercise is to make a simple decision tree using scikit-learn's DecisionTreeClassifier on the breast cancer dataset that comes pre-loaded with scikit-learn.

This dataset contains numeric measurements of various dimensions of individual tumors (such as perimeter and texture) from breast biopsies and a single outcome value (the tumor is either malignant, or benign).

We've preloaded the dataset of samples (measurements) into X and the target values per tumor into y. Now, you have to split the complete dataset into training and testing sets, and then train a DecisionTreeClassifier. You'll specify a parameter called max_depth. Many other parameters can be modified within this model, and you can check all of them out here.

Instructions
100 XP
  • Import:
    • train_test_split from sklearn.model_selection.
    • DecisionTreeClassifier from sklearn.tree.
  • Create training and test sets such that 20% of the data is used for testing. Use a random_state of 123.
  • Instantiate a DecisionTreeClassifier called dt_clf_4 with a max_depth of 4. This parameter specifies the maximum number of successive split points you can have before reaching a leaf node.
  • Fit the classifier to the training set and predict the labels of the test set.