Inspecting your model
Now that you have built a "fake news" classifier, you'll investigate what it has learned. You can map the important vector weights back to actual words using some simple inspection techniques.
You have your well performing tfidf Naive Bayes classifier available as nb_classifier
, and the vectors as tfidf_vectorizer
.
This exercise is part of the course
Introduction to Natural Language Processing in Python
Exercise instructions
- Save the class labels as
class_labels
by accessing the.classes_
attribute ofnb_classifier
. - Extract the features using the
.get_feature_names()
method oftfidf_vectorizer
. - Create a zipped array of the classifier coefficients with the feature names and sort them by the coefficients. To do this, first use
zip()
with the argumentsnb_classifier.coef_[0]
andfeature_names
. Then, usesorted()
on this. - Print the top 20 weighted features for the first label of
class_labels
and print the bottom 20 weighted features for the second label ofclass_labels
. This has been done for you.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the class labels: class_labels
class_labels = ____
# Extract the features: feature_names
feature_names = ____
# Zip the feature names together with the coefficient array and sort by weights: feat_with_weights
feat_with_weights = ____(____(____, ____))
# Print the first class label and the top 20 feat_with_weights entries
print(class_labels[0], feat_with_weights[:20])
# Print the second class label and the bottom 20 feat_with_weights entries
print(class_labels[1], feat_with_weights[-20:])