Get startedGet started for free

Does age play a role?

Another variable that could influence survival is age: it's probable children were saved first. You can test this by creating a new column with a categorical variable child.

To add this new variable you need to do two things:

  1. Create a new column, which is done through the $ operator. To create a new column, lucky, for example:

    train$lucky <- NA
    
  2. Provide the values for each observation (i.e., row) based on the age of the passenger. You can use a boolean test inside square brackets for this. For example, to set the lucky column to TRUE for passengers that survived the disaster, and the others to FALSE, you could use:

    train$lucky[train$Survived == 1] <- TRUE
    train$lucky[train$Survived == 0] <- FALSE
    

This exercise is part of the course

Kaggle R Tutorial on Machine Learning

View Course

Exercise instructions

  • Finish the code on the right to create a new column Child
  • whose default value is NA,
  • whose value is 1 if the passenger's Age is < 18 years and
  • whose value is 0 when the passenger's Age is >= 18 years.
  • Do a two-way comparison on the number of children vs adults that survived, in row-wise proportions.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Your train and test set are still loaded in
str(train)
str(test)

# Create the column child, and indicate whether child or no child
train$Child <- NA
train$Child[train$___ < ___] <- ___
train$Child[train$___ >= ___] <- ___

# Two-way comparison
Edit and Run Code