Get startedGet started for free

Modeling the UFO dataset, part 1

In this exercise, you're going to build a k-nearest neighbor model to predict which country the UFO sighting took place in. The X dataset contains the log-normalized seconds column, the one-hot encoded type columns, as well as the month and year when the sighting took place. The y labels are the encoded country column, where 1 is "us" and 0 is "ca".

This exercise is part of the course

Preprocessing for Machine Learning in Python

View Course

Exercise instructions

  • Print out the .columns of the X set.
  • Split the X and y sets, ensuring that the class distribution of the labels is the same in the training and tests sets, and using a random_state of 42.
  • Fit knn to the training data.
  • Print the test set accuracy of the knn model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Take a look at the features in the X set of data
print(____)

# Split the X and y sets
X_train, X_test, y_train, y_test = ____

# Fit knn to the training sets
knn.____

# Print the score of knn on the test sets
print(____)
Edit and Run Code