Exercise

# Scaling the data

For ML algorithms using distance based metrics, it is **crucial to always scale your data**, as features using different scales will distort your results. K-means uses the Euclidian distance to assess distance to cluster centroids, therefore you first need to scale your data before continuing to implement the algorithm. Let's do that first.

Available is the dataframe `df`

from the previous exercise, with some minor data preparation done so it is ready for you to use with `sklearn`

. The fraud labels are separately stored under `labels`

, you can use those to check the results later. `numpy`

has been imported as `np`

.

Instructions

**100 XP**

- Import the
`MinMaxScaler`

. - Transform your dataframe
`df`

into a numpy array`X`

by taking only the values of`df`

and make sure you have all`float`

values. - Apply the defined scaler onto
`X`

to obtain scaled values of`X_scaled`

to force all your features to a 0-1 scale.