Optimize n_neighbors
Now that we have scaled data, we can try using a KNN model. To maximize performance, we should tune our model's hyperparameters. For the k-nearest neighbors algorithm, we only have one hyperparameter: n
, the number of neighbors. We set this hyperparameter when we create the model with KNeighborsRegressor
. The argument for the number of neighbors is n_neighbors
.
We want to try a range of values that passes through the setting with the best performance. Usually we will start with 2 neighbors, and increase until our scoring metric starts to decrease. We'll use the R\(^2\) value from the .score()
method on the test set (scaled_test_features
and test_targets
) to optimize n
here. We'll use the test set scores to determine the best n
.
This exercise is part of the course
Machine Learning for Finance in Python
Exercise instructions
- Loop through values of 2 to 12 for
n
and set this asn_neighbors
in theknn
model. - Fit the model to the training data (
scaled_train_features
andtrain_targets
). - Print out the R\(^2\) values using the
.score()
method of theknn
model for the train and test sets, and take note of the best score on the test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.neighbors import KNeighborsRegressor
for n in range(____):
# Create and fit the KNN model
knn = KNeighborsRegressor(n_neighbors=____)
# Fit the model to the training data
knn.fit(____, ____)
# Print number of neighbors and the score to find the best value of n
print("n_neighbors =", n)
print('train, test scores')
print(knn.score(scaled_train_features, train_targets))
print(knn.score(____, ____))
print() # prints a blank line