Date features
You've built some basic features using numerical variables. Now, it's time to create features based on date and time. You will practice on a subsample from the Taxi Fare Prediction Kaggle competition data. The data represents information about the taxi rides and the goal is to predict the price for each ride.
Your objective is to generate date features from the pickup datetime. Recall that it's better to create new features for train and test data simultaneously. After the features are created, split the data back into the train and test DataFrames. Here it's done using pandas
' isin()
method.
The train
and test
DataFrames are already available in your workspace.
This exercise is part of the course
Winning a Kaggle Competition in Python
Exercise instructions
- Concatenate the
train
andtest
DataFrames into a single DataFrametaxi
. - Convert the "pickup_datetime" column to a
datetime
object. - Create the day of week (using
.dayofweek
attribute) and hour (using.hour
attribute) features from the "pickup_datetime" column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Concatenate train and test together
taxi = ____.____([train, test])
# Convert pickup date to datetime object
taxi['pickup_datetime'] = ____.____(taxi['pickup_datetime'])
# Create a day of week feature
taxi['dayofweek'] = taxi['pickup_datetime'].dt.____
# Create an hour feature
taxi['hour'] = taxi['pickup_datetime'].dt.____
# Split back into train and test
new_train = taxi[taxi['id'].isin(train['id'])]
new_test = taxi[taxi['id'].isin(test['id'])]