Modeling the UFO dataset, part 1
In this exercise, you're going to build a k-nearest neighbor model to predict which country the UFO sighting took place in. The X dataset contains the log-normalized seconds column, the one-hot encoded type columns, as well as the month and year when the sighting took place. The y labels are the encoded country column, where 1 is "us" and 0 is "ca".
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Print out the
.columnsof theXset. - Split the
Xandysets, ensuring that the class distribution of the labels is the same in the training and tests sets, and using arandom_stateof42. - Fit
knnto the training data. - Print the test set accuracy of the
knnmodel.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Take a look at the features in the X set of data
print(____)
# Split the X and y sets
X_train, X_test, y_train, y_test = ____
# Fit knn to the training sets
knn.____
# Print the score of knn on the test sets
print(____)