Session Ready
Exercise

k-Nearest Neighbors: Predict

Having fit a k-NN classifier, you can now use it to predict the label of a new data point. However, there is no unlabeled data available since all of it was used to fit the model! You can still use the .predict() method on the X that was used to fit the model, but it is not a good indicator of the model's ability to generalize to new, unseen data.

In the next video, Hugo will discuss a solution to this problem. For now, a random unlabeled data point has been generated and is available to you as X_new. You will use your classifier to predict the label for this new data point, as well as on the training data X that the model has already seen. Using .predict() on X_new will generate 1 prediction, while using it on X will generate 435 predictions: 1 for each sample.

The DataFrame has been pre-loaded as df. This time, you will create the feature array X and target variable array y yourself.

Instructions
100 XP
  • Create arrays for the features and the target variable from df. As a reminder, the target variable is 'party'.
  • Instantiate a KNeighborsClassifier with 6 neighbors.
  • Fit the classifier to the data.
  • Predict the labels of the training data, X.
  • Predict the label of the new data point X_new.