Comparing BoW and TF-IDF representations
You're part of the analytics team at a wearable tech company. Your goal is to help product managers understand customer feedback on the company's new smartwatch. You've already preprocessed the text and created two representations: bow_matrix
using CountVectorizer()
, and tfidf_matrix
using TfidfVectorizer()
. In this exercise, you'll visualize and compare the two to better understand how each captures word importance.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert BoW matrix to a DataFrame
df_bow = pd.DataFrame(
____,
columns=vectorizer.____
)
# Plot the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(____, annot=True)
plt.title("BoW Scores Across Reviews")
plt.xlabel("Terms")
plt.xticks(rotation=45)
plt.ylabel("Documents")
plt.show()