BaşlayınÜcretsiz Başlayın

Text classification using tf/idf vectors

Now that you've encoded the volunteer dataset's title column into tf/idf vectors, you'll use those vectors to predict the category_desc column.

Bu egzersiz

Preprocessing for Machine Learning in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Split the text_tfidf vector and y target variable into training and test sets, setting the stratify parameter equal to y, since the class distribution is uneven. Notice that we have to run the .toarray() method on the tf/idf vector, in order to get in it the proper format for scikit-learn.
  • Fit the X_train and y_train data to the Naive Bayes model, nb.
  • Print out the test set accuracy.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Split the dataset according to the class distribution of category_desc
y = volunteer["category_desc"]
X_train, X_test, y_train, y_test = ____(____.toarray(), ____, ____=____, random_state=42)

# Fit the model to the training data
nb.____(____, ____)

# Print out the model's accuracy
print(nb.____(____, ____))
Kodu Düzenle ve Çalıştır