Modeling without normalizing

Let's take a look at what might happen to your model's accuracy if you try to model data without doing some sort of standardization first.

Here we have a subset of the wine dataset. One of the columns, Proline, has an extremely high variance compared to the other columns. This is an example of where a technique like log normalization would come in handy, which you'll learn about in the next section.

The scikit-learn model training process should be familiar to you at this point, so we won't go too in-depth with it. You already have a k-nearest neighbors model available (knn) as well as the X and y sets you need to fit and score on.

Split up the X and y sets into training and test sets, ensuring that class labels are equally distributed in both sets.
Fit the knn model to the training features and labels.
Print the test set accuracy of the knn model using the .score() method.

Introduction to Data Preprocessing

Standardizing Data

Feature Engineering

Selecting Features for Modeling

Putting It All Together

Exercise

Modeling without normalizing

Instructions