Replicate validation score

You've seen both validation and Public Leaderboard scores in the video. However, the code examples are available only for the test data. To get the validation scores you have to repeat the same process on the holdout set.

Throughout this chapter, you will work with New York City Taxi competition data. The problem is to predict the fare amount for a taxi ride in New York City. The competition metric is the root mean squared error.

The first goal is to evaluate the Baseline model on the validation data. You will replicate the simplest Baseline based on the mean of "fare_amount". Recall that as a validation strategy we used a 30% holdout split with validation_train as train and validation_test as holdout DataFrames. Both of them are available in your workspace.

Calculate the mean of "fare_amount" over the whole validation_train DataFrame.
Assign this naive prediction value to all the holdout predictions. Store them in the "pred" column.

Kaggle competitions process

Dive into the Competition

Feature Engineering

Modeling

Exercise

Replicate validation score

Instructions