Standardized data and modeling
1. Standardized data and modeling
Now that we've learned a couple of different methods for standardization, it's time to see how this fits into the modeling workflow. As mentioned before, many models in scikit-learn require our data to be scaled appropriately across columns, otherwise we risk biasing the results.2. K-nearest neighbors
You should already be a little familiar with both k-nearest neighbors, as well as the scikit-learn workflow, based on previous courses, but we'll do a quick review of both. K-nearest neighbors is a model that classifies data based on its distance to training set data. A new data point is assigned a label based on the class that the majority of surrounding data points belong to. The workflow for training a model in scikit-learn starts with splitting the data into a training and test set. This can be done with scikit-learn's train_test_split function. Splitting the data will allow us to evaluate the model's performance using unseen data, rather than evaluating its performance on the data it was trained on. Once the data has been split, we can begin preprocessing the training data. It's really important to split the data prior to preprocessing, so none of the test data is used to train the model. When non-training data is used to train the model, this is called data leakage, and it should be avoided so that any performance metrics are reflective of the model's ability to generalize to unseen data. We instantiate a k-neighbors classifier and a standard scaler to scale our features. Here, we preprocess and fit the training features using the fit_transform method, and preprocess the test features using the transform method. Using the transform method means that the test features won't be used to fit the model and avoids data leakage. Now that we've finished preprocessing, we can fit the KNN model to the scaled training features, and return the test set accuracy using the score method on the scaled test features and test labels.3. Let's practice!
Now it's your turn to put everything together!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.