Session Ready
Exercise

Feature-engineering for our Titanic data set

Data Science is an art that benefits from a human element. Enter feature engineering: creatively engineering your own features by combining the different existing variables.

While feature engineering is a discipline in itself, too broad to be covered here in detail, you will have a look at a simple example by creating your own new predictive attribute: family_size.

A valid assumption is that larger families need more time to get together on a sinking ship, and hence have lower probability of surviving. Family size is determined by the variables SibSp and Parch, which indicate the number of family members a certain passenger is traveling with. So when doing feature engineering, you add a new variable family_size, which is the sum of SibSp and Parch plus one (the observation itself), to the test and train set.

Instructions
100 XP
  • Create a new train set train_two that differs from train only by having an extra column with your feature engineered variable family_size.
  • Add your feature engineered variable family_size in addition to Pclass, Sex, Age, Fare, SibSp and Parch to features_three.
  • Create a new decision tree as my_tree_three and fit the decision tree with your new feature set features_three. Then check out the score of the decision tree.