Exercise

Creating your first decision tree

Inside rpart, there is therpart() function to build your first decision tree. The function takes multiple arguments:

  • formula: specifying variable of interest, and the variables used for prediction (e.g. formula = Survived ~ Sex + Age).
  • data: The data set to build the decision tree (here train).
  • method: Type of prediction you want. We want to predict a categorical variable, so classification: method = "class".

Your call could look like this:

my_tree <- rpart(Survived ~ Sex + Age,
                 data = train,
                 method ="class")

To visualize the resulting tree, you can use the plot(my_tree) and text(my_tree). The resutling graphs will not be that informative, but R has packages to make it all fancier: rattle, rpart.plot, and RColorBrewer.

Instructions

100 XP
  • Build a decision tree my_tree_two:
  • You want to predict Survived based on Pclass, Sex, Age, SibSp, Parch, Fare and Embarked.
  • Use the train data to build the tree
  • Use method to specify that you want to classify.
  • Visualize my_tree_two with plot() and text().
  • Load the R packages rattle, rpart.plot, and RColorBrewer.
  • Use fancyRpartPlot(my_tree) to create a much fancier visualization of your tree.