1. Learn
  2. /
  3. Courses
  4. /
  5. Kaggle Python Tutorial on Machine Learning

Exercise

Creating your first decision tree

You will use the scikit-learn and numpy libraries to build your first decision tree. scikit-learn can be used to create tree objects from the DecisionTreeClassifier class. The methods that we will use take numpy arrays as inputs and therefore we will need to create those from the DataFrame that we already have. We will need the following to build a decision tree

  • target: A one-dimensional numpy array containing the target/response from the train data. (Survival in your case)
  • features: A multidimensional numpy array containing the features/predictors from the train data. (ex. Sex, Age)

Take a look at the sample code below to see what this would look like:

target = train["Survived"].values

features = train[["Sex", "Age"]].values

my_tree = tree.DecisionTreeClassifier()

my_tree = my_tree.fit(features, target)

One way to quickly see the result of your decision tree is to see the importance of the features that are included. This is done by requesting the .feature_importances_ attribute of your tree object. Another quick metric is the mean accuracy that you can compute using the .score() function with features_one and target as arguments.

Ok, time for you to build your first decision tree in Python! The train and testing data from chapter 1 are available in your workspace.

Instructions

100 XP
  • Build the target and features_one numpy arrays. The target will be based on the Survived column in train. The features array will be based on the variables Passenger, Class, Sex, Age, and Passenger Fare
  • Build a decision tree my_tree_one to predict survival using features_one and target
  • Look at the importance of features in your tree and compute the score