IniziaInizia gratis

Standardization

While normalization can be useful for scaling a column between two data points, it is hard to compare two scaled columns if even one of them is overly affected by outliers. One commonly used solution to this is called standardization, where instead of having a strict upper and lower bound, you center the data around its mean, and calculate the number of standard deviations away from mean each data point is.

Questo esercizio fa parte del corso

Feature Engineering for Machine Learning in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Import StandardScaler from sklearn's preprocessing module.
  • Instantiate the StandardScaler() as SS_scaler.
  • Fit the StandardScaler on the Age column of so_numeric_df.
  • Transform the same column with the scaler you just fit.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Import StandardScaler
____

# Instantiate StandardScaler
SS_scaler = ____()

# Fit SS_scaler to the data
____.____(so_numeric_df[['Age']])

# Transform the data using the fitted scaler
so_numeric_df['Age_SS'] = ____.____(so_numeric_df[['Age']])

# Compare the origional and transformed column
print(so_numeric_df[['Age_SS', 'Age']].head())
Modifica ed esegui il codice