Baseline based on the date
We've already built 3 different baseline models. To get more practice, let's build a couple more. The first model is based on the grouping variables. It's clear that the ride fare could depend on the part of the day. For example, prices could be higher during the rush hours.
Your goal is to build a baseline model that will assign the average "fare_amount" for the corresponding hour. For now, you will create the model for the whole train
data and make predictions for the test
dataset.
The train
and test
DataFrames are available in your workspace. Moreover, the "pickup_datetime" column in both DataFrames is already converted to a datetime
object for you.
This exercise is part of the course
Winning a Kaggle Competition in Python
Exercise instructions
- Get the hour from the "pickup_datetime" column for the
train
andtest
DataFrames. - Calculate the mean "fare_amount" for each hour on the train data.
- Make
test
predictions usingpandas
'map()
method and the grouping obtained. - Write predictions to the file.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get pickup hour from the pickup_datetime column
train['hour'] = train['pickup_datetime'].dt.____
test['hour'] = test['pickup_datetime'].dt.____
# Calculate average fare_amount grouped by pickup hour
hour_groups = train.____('____')['____'].mean()
# Make predictions on the test set
test['fare_amount'] = test.hour.map(____)
# Write predictions
test[['id','fare_amount']].____('hour_mean_sub.csv', index=False)