Date features
You've built some basic features using numerical variables. Now, it's time to create features based on date and time. You will practice on a subsample from the Taxi Fare Prediction Kaggle competition data. The data represents information about the taxi rides and the goal is to predict the price for each ride.
Your objective is to generate date features from the pickup datetime. Recall that it's better to create new features for train and test data simultaneously. After the features are created, split the data back into the train and test DataFrames. Here it's done using pandas' isin() method.
The train and test DataFrames are already available in your workspace.
This exercise is part of the course
Winning a Kaggle Competition in Python
Exercise instructions
- Concatenate the
trainandtestDataFrames into a single DataFrametaxi. - Convert the "pickup_datetime" column to a
datetimeobject. - Create the day of week (using
.dayofweekattribute) and hour (using.hourattribute) features from the "pickup_datetime" column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Concatenate train and test together
taxi = ____.____([train, test])
# Convert pickup date to datetime object
taxi['pickup_datetime'] = ____.____(taxi['pickup_datetime'])
# Create a day of week feature
taxi['dayofweek'] = taxi['pickup_datetime'].dt.____
# Create an hour feature
taxi['hour'] = taxi['pickup_datetime'].dt.____
# Split back into train and test
new_train = taxi[taxi['id'].isin(train['id'])]
new_test = taxi[taxi['id'].isin(test['id'])]