MulaiMulai sekarang secara gratis

Practicing standardization

It is dangerous to use KNN on unknown distributions blindly. Its performance suffers greatly when the feature distributions don't have the same scales. Unscaled features will skew distance calculations and thus return unrealistic anomaly scores.

A common technique to counter this is using standardization, which involves removing the mean from a feature and dividing it by the standard deviation. This has the effect of making the feature have a mean of 0 and a variance of 1.

Practice standardization on the females dataset, which has already been loaded for you.

Latihan ini adalah bagian dari kursus

Anomaly Detection in Python

Lihat Kursus

Petunjuk latihan

  • Create an instance of StandardScaler() and store it as ss.
  • Extract feature and target arrays into X and y. The target is the weightkg column.
  • Fit StandardScaler() to X and transform it simultaneously.
  • Repeat the above process but preserve the column names of the X DataFrame.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

from sklearn.preprocessing import StandardScaler

# Initialize a StandardScaler
ss = ____

# Extract feature and target arrays
X = ____ 
y = ____

# Fit/transform X
X_transformed = ____

# Fit/transform X but preserve the column names
X.____ = ____
Edit dan Jalankan Kode