CommencerCommencer gratuitement

Creating new features

Feature engineering includes also the actual creation of new features. Creating new features is important since the model relies on such features for prediction accuracy. In this exercise, you will check on properties of three columns which appear as integers in the data but represent categorical values. These three columns are: search_engine_type, product_type, and advertiser_type. You will create count features for those 3 columns, as well as device_id and site_id. These count features represent the number of clicks for each of those columns and will be used later on for prediction.

The pandas module is available as pd in your workspace and the sample DataFrame is loaded as df.

Cet exercice fait partie du cours

Predicting CTR with Machine Learning in Python

Afficher le cours

Instructions

  • Print the total number of values and the number of unique values for each feature in the feature_list list.
  • Create new features from the features in new_feature_list by counting the number of clicks for each feature using .transform().

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Get counts of total and unique values for given features
feature_list = ["search_engine_type", "product_type", "advertiser_type"]
for feature in feature_list:
	print(df[feature].____)
	print(df[feature].____)

# Define new features as counts
new_feature_list = ['device_id', 'site_id'] + feature_list
for new_feature in new_feature_list:
  df[new_feature + '_count'] = df.____(
    new_feature)['click'].____("count")
print(df.head(5))
Modifier et exécuter le code