Get startedGet started for free

Baseline based on the date

We've already built 3 different baseline models. To get more practice, let's build a couple more. The first model is based on the grouping variables. It's clear that the ride fare could depend on the part of the day. For example, prices could be higher during the rush hours.

Your goal is to build a baseline model that will assign the average "fare_amount" for the corresponding hour. For now, you will create the model for the whole train data and make predictions for the test dataset.

The train and test DataFrames are available in your workspace. Moreover, the "pickup_datetime" column in both DataFrames is already converted to a datetime object for you.

This exercise is part of the course

Winning a Kaggle Competition in Python

View Course

Exercise instructions

  • Get the hour from the "pickup_datetime" column for the train and test DataFrames.
  • Calculate the mean "fare_amount" for each hour on the train data.
  • Make test predictions using pandas' map() method and the grouping obtained.
  • Write predictions to the file.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get pickup hour from the pickup_datetime column
train['hour'] = train['pickup_datetime'].dt.____
test['hour'] = test['pickup_datetime'].dt.____

# Calculate average fare_amount grouped by pickup hour 
hour_groups = train.____('____')['____'].mean()

# Make predictions on the test set
test['fare_amount'] = test.hour.map(____)

# Write predictions
test[['id','fare_amount']].____('hour_mean_sub.csv', index=False)
Edit and Run Code