Does age play a role?
Another variable that could influence survival is age: it's probable children were saved first. You can test this by creating a new column with a categorical variable child
.
To add this new variable you need to do two things:
Create a new column, which is done through the
$
operator. To create a new column,lucky
, for example:train$lucky <- NA
Provide the values for each observation (i.e., row) based on the age of the passenger. You can use a boolean test inside square brackets for this. For example, to set the
lucky
column toTRUE
for passengers that survived the disaster, and the others toFALSE
, you could use:train$lucky[train$Survived == 1] <- TRUE train$lucky[train$Survived == 0] <- FALSE
This exercise is part of the course
Kaggle R Tutorial on Machine Learning
Exercise instructions
- Finish the code on the right to create a new column
Child
- whose default value is
NA
, - whose value is
1
if the passenger'sAge
is < 18 years and - whose value is
0
when the passenger'sAge
is >= 18 years. - Do a two-way comparison on the number of children vs adults that survived, in row-wise proportions.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Your train and test set are still loaded in
str(train)
str(test)
# Create the column child, and indicate whether child or no child
train$Child <- NA
train$Child[train$___ < ___] <- ___
train$Child[train$___ >= ___] <- ___
# Two-way comparison