Identifying features for standardization
In this exercise, you'll investigate the variance of columns in the UFO dataset to determine which features should be standardized. After taking a look at the variances of the seconds
and minutes
column, you'll see that the variance of the seconds
column is extremely high. Because seconds
and minutes
are related to each other (an issue we'll deal with when we select features for modeling), let's log normalize the seconds
column.
Diese Übung ist Teil des Kurses
Preprocessing for Machine Learning in Python
Anleitung zur Übung
- Calculate the variance in the
seconds
andminutes
columns and take a close look at the results. - Perform log normalization on the
seconds
column, transforming it into a new column namedseconds_log
. - Print out the variance of the
seconds_log
column.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Check the variance of the seconds and minutes columns
print(____)
# Log normalize the seconds column
ufo["seconds_log"] = ____
# Print out the variance of just the seconds_log column
print(____)