Get startedGet started for free

Creating new features

Feature engineering includes also the actual creation of new features. Creating new features is important since the model relies on such features for prediction accuracy. In this exercise, you will check on properties of three columns which appear as integers in the data but represent categorical values. These three columns are: search_engine_type, product_type, and advertiser_type. You will create count features for those 3 columns, as well as device_id and site_id. These count features represent the number of clicks for each of those columns and will be used later on for prediction.

The pandas module is available as pd in your workspace and the sample DataFrame is loaded as df.

This exercise is part of the course

Predicting CTR with Machine Learning in Python

View Course

Exercise instructions

  • Print the total number of values and the number of unique values for each feature in the feature_list list.
  • Create new features from the features in new_feature_list by counting the number of clicks for each feature using .transform().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get counts of total and unique values for given features
feature_list = ["search_engine_type", "product_type", "advertiser_type"]
for feature in feature_list:
	print(df[feature].____)
	print(df[feature].____)

# Define new features as counts
new_feature_list = ['device_id', 'site_id'] + feature_list
for new_feature in new_feature_list:
  df[new_feature + '_count'] = df.____(
    new_feature)['click'].____("count")
print(df.head(5))
Edit and Run Code