LoslegenKostenlos loslegen

Scaling the data

For ML algorithms using distance based metrics, it is crucial to always scale your data, as features using different scales will distort your results. K-means uses the Euclidian distance to assess distance to cluster centroids, therefore you first need to scale your data before continuing to implement the algorithm. Let's do that first.

Available is the dataframe df from the previous exercise, with some minor data preparation done so it is ready for you to use with sklearn. The fraud labels are separately stored under labels, you can use those to check the results later. numpy has been imported as np.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Import the MinMaxScaler.
  • Transform your dataframe df into a numpy array X by taking only the values of df and make sure you have all float values.
  • Apply the defined scaler onto X to obtain scaled values of X_scaled to force all your features to a 0-1 scale.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the scaler
from sklearn.preprocessing import ____

# Take the float values of df for X
X = df.values.astype(np.____)

# Define the scaler and apply to the data
scaler = ____()
X_scaled = scaler.____(X)
Code bearbeiten und ausführen