LoslegenKostenlos loslegen

Modeling without normalizing

Let's take a look at what might happen to your model's accuracy if you try to model data without doing some sort of standardization first.

Here we have a subset of the wine dataset. One of the columns, Proline, has an extremely high variance compared to the other columns. This is an example of where a technique like log normalization would come in handy, which you'll learn about in the next section.

The scikit-learn model training process should be familiar to you at this point, so we won't go too in-depth with it. You already have a k-nearest neighbors model available (knn) as well as the X and y sets you need to fit and score on.

Diese Übung ist Teil des Kurses

Preprocessing for Machine Learning in Python

Kurs anzeigen

Anleitung zur Übung

  • Split up the X and y sets into training and test sets, ensuring that class labels are equally distributed in both sets.
  • Fit the knn model to the training features and labels.
  • Print the test set accuracy of the knn model using the .score() method.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, stratify=____, random_state=42)

knn = KNeighborsClassifier()

# Fit the knn model to the training data
knn.____(____, ____)

# Score the model on the test data
print(knn.____(____))
Code bearbeiten und ausführen