Predict and submit to Kaggle
To send a submission to Kaggle you need to predict the survival rates for the observations in the test set. In the previous chapter we created rather amateuristic predictions with manual subsetting operations. Now that we have a decision tree, we can make use of the predict()
function to "generate" our answer:
predict(my_tree_two, test, type = "class")
Here, my_tree_two
is the tree model you've just built, test
is the data set to build the preditions for, and type = "class"
specifies that you want to classify observations.
Before you can submit to Kaggle, you'll have to convert your predictions to a CSV file with exactly 418 entries and 2 columns PassengerId
and Survived
. Head over to the instructions to get to it!
This exercise is part of the course
Kaggle R Tutorial on Machine Learning
Exercise instructions
- Use
predict()
as specified above to make predictions on the test set. Assign the result tomy_prediction
. - Finish the
data.frame()
call to create themy_solution
data frame that is in line with Kaggle's standards: - The
PassengerId
column should contain thePassengerId
column oftest
. - The
Survivid
column should contain the values inmy_prediction
. - Check that
my_solution
has 418 entries withnrow()
. - Finish the
write.csv()
call to write the data inmy_solution
to"my_solution.csv"
. Don't remove therow.names = FALSE
argument.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# my_tree_two and test are available in the workspace
# Make predictions on the test set
my_prediction <- predict(___, newdata = ___, type = ___)
# Finish the data.frame() call
my_solution <- data.frame(PassengerId = ___, Survived = ___)
# Use nrow() on my_solution
# Finish the write.csv() call
write.csv(___, file = ___, row.names = FALSE)