Aan de slagGa gratis aan de slag

Identifying features for standardization

In this exercise, you'll investigate the variance of columns in the UFO dataset to determine which features should be standardized. After taking a look at the variances of the seconds and minutes column, you'll see that the variance of the seconds column is extremely high. Because seconds and minutes are related to each other (an issue we'll deal with when we select features for modeling), let's log normalize the seconds column.

Deze oefening maakt deel uit van de cursus

Preprocessing for Machine Learning in Python

Cursus bekijken

Oefeninstructies

  • Calculate the variance in the seconds and minutes columns and take a close look at the results.
  • Perform log normalization on the seconds column, transforming it into a new column named seconds_log.
  • Print out the variance of the seconds_log column.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Check the variance of the seconds and minutes columns
print(____)

# Log normalize the seconds column
ufo["seconds_log"] = ____

# Print out the variance of just the seconds_log column
print(____)
Code bewerken en uitvoeren