1. Learn
  2. /
  3. Courses
  4. /
  5. Preprocessing for Machine Learning in Python

Exercise

Modeling without normalizing

Let's take a look at what might happen to your model's accuracy if you try to model data without doing some sort of standardization first.

Here we have a subset of the wine dataset. One of the columns, Proline, has an extremely high variance compared to the other columns. This is an example of where a technique like log normalization would come in handy, which you'll learn about in the next section.

The scikit-learn model training process should be familiar to you at this point, so we won't go too in-depth with it. You already have a k-nearest neighbors model available (knn) as well as the X and y sets you need to fit and score on.

Instructions

100 XP
  • Split up the X and y sets into training and test sets, ensuring that class labels are equally distributed in both sets.
  • Fit the knn model to the training features and labels.
  • Print the test set accuracy of the knn model using the .score() method.