Modeling the UFO dataset, part 2
Finally, you'll build a model using the text vector we created, desc_tfidf, using the filtered_words list to create a filtered text vector. Let's see if you can predict the type of the sighting based on the text. You'll use a Naive Bayes model for this.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Filter the
desc_tfidfvector by passing a list offiltered_wordsinto the index. - Split the
filtered_textfeatures andy, ensuring an equal class distribution in the training and test sets; use arandom_stateof42. - Use the
nbmodel's.fit()to fitX_trainandy_train. - Print out the
.score()of thenbmodel on theX_testandy_testsets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use the list of filtered words we created to filter the text vector
filtered_text = ____[:, list(____)]
# Split the X and y sets using train_test_split, setting stratify=y
X_train, X_test, y_train, y_test = ____(____.toarray(), ____, ____, random_state=42)
# Fit nb to the training sets
____
# Print the score of nb on the test sets
____