Modeling the UFO dataset, part 1
In this exercise, you're going to build a k-nearest neighbor model to predict which country the UFO sighting took place in. The X
dataset contains the log-normalized seconds column, the one-hot encoded type columns, as well as the month and year when the sighting took place. The y
labels are the encoded country column, where 1
is "us"
and 0
is "ca"
.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Print out the
.columns
of theX
set. - Split the
X
andy
sets, ensuring that the class distribution of the labels is the same in the training and tests sets, and using arandom_state
of42
. - Fit
knn
to the training data. - Print the test set accuracy of the
knn
model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Take a look at the features in the X set of data
print(____)
# Split the X and y sets
X_train, X_test, y_train, y_test = ____
# Fit knn to the training sets
knn.____
# Print the score of knn on the test sets
print(____)