Exercise

# Modeling without normalizing

Let's take a look at what might happen to your model's accuracy if you try to model data without doing some sort of standardization first. Here we have a subset of the `wine`

dataset. One of the columns, `Proline`

, has an extremely high variance compared to the other columns. This is an example of where a technique like log normalization would come in handy, which you'll learn about in the next section.

The scikit-learn model training process should be familiar to you at this point, so we won't go too in-depth with it. You already have a k-nearest neighbors model available (`knn`

) as well as the `X`

and `y`

sets you need to fit and score on.

Instructions

**100 XP**

- Split up the
`X`

and`y`

sets into training and test sets using`train_test_split()`

. - Use the
`knn`

model's`fit()`

method on the`X_train`

data and`y_train`

labels, to fit the model to the data. - Print out the
`knn`

model's`score()`

on the`X_test`

data and`y_test`

labels to evaluate the model.