1. What about the 'k' in kNN?
You may be wondering why kNN is called 'k' Nearest Neighbors, what exactly is 'k'?
The letter k is a variable that specifies the number of neighbors to consider when making the classification. You can imagine it as determining the size of the neighborhoods.
Until now, we've ignored k, and thus R has used the default value of '1'. This means that only the single nearest, most similar, neighbor was used to classify the unlabeled example.
While this seems OK on the surface, let's work through an example to see why the value of k may have a substantial impact on the performance of our classifier.
2. Choosing 'k' neighbors
Suppose our vehicle observed the sign at the center of the image here. Its five nearest neighbors are depicted.
The single nearest neighbor is a speed limit sign, which shares a very similar background color. Unfortunately, in this case, a kNN classifier with k set to one would make an incorrect classification.
Slightly further away are the second, third, and fourth nearest neighbors, which are all pedestrian crossing signs. Suppose we set k to three. What would happen?
The three nearest neighbors, a speed limit sign and two pedestrian crossing signs, would take a vote. The category with the majority of nearest neighbors, in this case the pedestrian crossing sign, is the winner.
Increasing k to five allows the five nearest neighbors to vote. The pedestrian crossing sign still wins with a margin of 3-to-2. Note that in the case of a tie, the winner is typically decided at random.
3. Bigger 'k' is not always better
In the previous example, setting k to a higher value resulted in a correct prediction. But it is not always the case that bigger is better.
A small k creates very small neighborhoods; the classifier is able to discover very subtle patterns. As this image illustrates, you might imagine it as being able to distinguish between groups even when their boundary is somewhat "fuzzy."
On the other hand, sometimes a "fuzzy" boundary is not a true pattern, but rather due to some other factor that adds randomness into the data. This is called noise. Setting k larger, as this image shows, ignores some potentially-noisy points in an effort to discover a broader, more general pattern.
4. Choosing 'k'
So, how should you set k? Unfortunately, there is no universal rule. In practice, the optimal value depends on the complexity of the pattern to be learned, as well as the impact of noisy data.
Some suggest a rule of thumb starting with k equal to the square root of the number of observations in the training data. For example, if the car had observed 100 previous road signs, you might set k to 10.
An even better approach is to test several different values of k and compare the performance on data it has not seen before.
5. Let's practice!
In the next coding exercise, you'll have an opportunity to see the impact of k on the vehicle's ability to correctly classify signs.