Analyzing datetime columns
Feature engineering is an important step in all machine learning workflows in order to process features from different data types. In particular, datetime columns are common in many datasets. In this exercise, you will explore the hour
column in the dataset, which is stored as an integer but represents a datetime
. First you will parse the hour
column to convert it into a datetime
column. Then you will extract the hour of the day from that datetime
column, and calculate the total number of clicks based on that hour of the day.
The pandas module is available as pd
in your workspace and the sample DataFrame is loaded as df
.
This exercise is part of the course
Predicting CTR with Machine Learning in Python
Exercise instructions
- Convert the
hour
column from an integer to adatetime
column usingpd.to_datetime()
. - Using the datetime accessor
.dt
, extract the hour field from the converted column using.hour
. - Compute total clicks by the extracted hour of day using
.sum()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Change the hour column to a datetime and extract hour of day
df['hour'] = pd.____(df['hour'], format = '%y%m%d%H')
df['hour_of_day'] = df['hour'].____.____
print(df.head(5))
# Get and plot total clicks by hour of day
df.____('hour_of_day')['click'].____.plot.bar(figsize=(12,6))
plt.ylabel('Number of clicks')
plt.title('Number of clicks by hour of day')
plt.show()