Modeling the UFO dataset, part 2
Finally, you'll build a model using the text vector we created, desc_tfidf
, using the filtered_words
list to create a filtered text vector. Let's see if you can predict the type
of the sighting based on the text. You'll use a Naive Bayes model for this.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Filter the
desc_tfidf
vector by passing a list offiltered_words
into the index. - Split the
filtered_text
features andy
, ensuring an equal class distribution in the training and test sets; use arandom_state
of42
. - Use the
nb
model's.fit()
to fitX_train
andy_train
. - Print out the
.score()
of thenb
model on theX_test
andy_test
sets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use the list of filtered words we created to filter the text vector
filtered_text = ____[:, list(____)]
# Split the X and y sets using train_test_split, setting stratify=y
X_train, X_test, y_train, y_test = ____(____.toarray(), ____, ____, random_state=42)
# Fit nb to the training sets
____
# Print the score of nb on the test sets
____