Comparing BoW and TF-IDF representations
You're part of the analytics team at a wearable tech company. Your goal is to help product managers understand customer feedback on the company's new smartwatch. You've already preprocessed the text and created two representations: bow_matrix using CountVectorizer(), and tfidf_matrix using TfidfVectorizer(). In this exercise, you'll visualize and compare the two to better understand how each captures word importance.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert BoW matrix to a DataFrame
df_bow = pd.DataFrame(
____,
columns=vectorizer.____
)
# Plot the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(____, annot=True)
plt.title("BoW Scores Across Reviews")
plt.xlabel("Terms")
plt.xticks(rotation=45)
plt.ylabel("Documents")
plt.show()