ComenzarEmpieza gratis

Baseline based on the date

We've already built 3 different baseline models. To get more practice, let's build a couple more. The first model is based on the grouping variables. It's clear that the ride fare could depend on the part of the day. For example, prices could be higher during the rush hours.

Your goal is to build a baseline model that will assign the average "fare_amount" for the corresponding hour. For now, you will create the model for the whole train data and make predictions for the test dataset.

The train and test DataFrames are available in your workspace. Moreover, the "pickup_datetime" column in both DataFrames is already converted to a datetime object for you.

Este ejercicio forma parte del curso

Winning a Kaggle Competition in Python

Ver curso

Instrucciones del ejercicio

  • Get the hour from the "pickup_datetime" column for the train and test DataFrames.
  • Calculate the mean "fare_amount" for each hour on the train data.
  • Make test predictions using pandas' map() method and the grouping obtained.
  • Write predictions to the file.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Get pickup hour from the pickup_datetime column
train['hour'] = train['pickup_datetime'].dt.____
test['hour'] = test['pickup_datetime'].dt.____

# Calculate average fare_amount grouped by pickup hour 
hour_groups = train.____('____')['____'].mean()

# Make predictions on the test set
test['fare_amount'] = test.hour.map(____)

# Write predictions
test[['id','fare_amount']].____('hour_mean_sub.csv', index=False)
Editar y ejecutar código