Get startedGet started for free

Modeling the UFO dataset, part 2

Finally, you'll build a model using the text vector we created, desc_tfidf, using the filtered_words list to create a filtered text vector. Let's see if you can predict the type of the sighting based on the text. You'll use a Naive Bayes model for this.

This exercise is part of the course

Preprocessing for Machine Learning in Python

View Course

Exercise instructions

  • Filter the desc_tfidf vector by passing a list of filtered_words into the index.
  • Split the filtered_text features and y, ensuring an equal class distribution in the training and test sets; use a random_state of 42.
  • Use the nb model's .fit() to fit X_train and y_train.
  • Print out the .score() of the nb model on the X_test and y_test sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use the list of filtered words we created to filter the text vector
filtered_text = ____[:, list(____)]

# Split the X and y sets using train_test_split, setting stratify=y 
X_train, X_test, y_train, y_test = ____(____.toarray(), ____, ____, random_state=42)

# Fit nb to the training sets
____

# Print the score of nb on the test sets
____
Edit and Run Code