Text classification using tf/idf vectors
Now that you've encoded the volunteer dataset's title column into tf/idf vectors, you'll use those vectors to predict the category_desc column.
Latihan ini adalah bagian dari kursus
Preprocessing for Machine Learning in Python
Petunjuk latihan
- Split the
text_tfidfvector andytarget variable into training and test sets, setting thestratifyparameter equal toy, since the class distribution is uneven. Notice that we have to run the.toarray()method on the tf/idf vector, in order to get in it the proper format for scikit-learn. - Fit the
X_trainandy_traindata to the Naive Bayes model,nb. - Print out the test set accuracy.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Split the dataset according to the class distribution of category_desc
y = volunteer["category_desc"]
X_train, X_test, y_train, y_test = ____(____.toarray(), ____, ____=____, random_state=42)
# Fit the model to the training data
nb.____(____, ____)
# Print out the model's accuracy
print(nb.____(____, ____))