Get Started

Standard scaling

Standard scaling transforms numerical features to have a mean of 0 and variance of 1. In this exercise, you will do standard scaling using StandardScaler() from sklearn. First, you will select only the relevant columns to apply scaling on, using a combination of filtering for numerical columns along with some knowledge of the columns. This filtering is already provided and will be done through the use of regular expressions, which allows for partial string matches. Then you will use fit_transform() to transform the relevant columns.

The pandas module is available as pd in your workspace and the sample DataFrame is loaded as df. Additionally, the hour column is already converted to a datetime, and StandardScaler from sklearn.preprocessing is available.

This is a part of the course

“Predicting CTR with Machine Learning in Python”

View Course

Exercise instructions

  • Select the numerical columns, and filter the given filter_cols using .select_dtypes().
  • Apply standard scaling to the relevant columns by first creating a StandardScaler() and then using .fit_transform().
  • Print the variance of the newly transformed columns using .var().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get non-categorical columns, with a filter
num_df = df.____(include=['int', 'float'])
filter_cols = ['click', 'banner_pos', 'device_type',
               'search_engine_type', 'product_type', 'advertiser_type']
new_df = num_df[num_df.columns[~num_df.columns.____(filter_cols)]]
num_cols = new_df.____

# Transform columns using StandardScaler
scaler = ____()
df[num_cols] = scaler.____(df[____])

# Print mean and variance of transformed columns
print(df[num_cols].mean())
print(df[num_cols].____)
Edit and Run Code

This exercise is part of the course

Predicting CTR with Machine Learning in Python

IntermediateSkill Level
4.8+
4 reviews

Learn how to predict click-through rates on ads and implement basic machine learning models in Python so that you can see how to better optimize your ads.

This chapter provides the foundations for exploratory data analysis (EDA). Using sample data you’ll use the pandas library to look at columns and data types, explore missing data, and use hashing to perform feature engineering on categorical features. All of which are important when exploring features for more accurate CTR prediction.

Exercise 1: Exploratory data analysisExercise 2: A first lookExercise 3: Checking for missing valuesExercise 4: Distributions by CTRExercise 5: Feature engineeringExercise 6: Analyzing datetime columnsExercise 7: Converting categorical variablesExercise 8: Creating new featuresExercise 9: Standardizing featuresExercise 10: Log normalizationExercise 11: Understanding standardizationExercise 12: Standard scaling

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free