1. Learn
  2. /
  3. Courses
  4. /
  5. Kaggle R Tutorial on Machine Learning

Exercise

Re-engineering our Titanic data set

Data Science is an art that benefits from a human element. Enter feature engineering: creatively engineering your own features by combining the different existing variables.

While feature engineering is a discipline in itself, too broad to be covered here in detail, let's have have a look at a simple example and create a new predictive attribute: family_size.

A valid assumption is that larger families need more time to get together on a sinking ship, and hence have less chance of surviving. Family size is determined by the variables SibSp and Parch, which indicate the number of family members a certain passenger is traveling with. So when doing feature engineering, you add a new variable family_size, which is the sum of SibSp and Parch plus one (the observation itself), to the test and train set.

Instructions

100 XP
  • Create a new train set train_two that differs from train only by having an extra column with your feature engineered variable family_size.
  • Finish the command to build my_tree_four: The formula in rpart() should include family_size and the tree should be used on the train_two data.
  • Visualize your new decision tree with fancyRpartPlot().