Top terms in movie clusters
Now that you have created a sparse matrix, generate cluster centers and print the top three terms in each cluster. Use the .todense() method to convert the sparse matrix, tfidf_matrix to a normal matrix for the kmeans() function to process. Then, use the .get_feature_names() method to get a list of terms in the tfidf_vectorizer object. The zip() function in Python joins two lists.
The tfidf_vectorizer object and sparse matrix, tfidf_matrix, from the previous have been retained in this exercise. kmeans has been imported from SciPy.
With a higher number of data points, the clusters formed would be defined more clearly. However, this requires some computational power, making it difficult to accomplish in an exercise here.
Cet exercice fait partie du cours
Cluster Analysis in Python
Instructions
- Generate cluster centers through the
kmeans()function. - Generate a list of terms from the
tfidf_vectorizerobject. - Print top 3 terms of each cluster.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
num_clusters = 2
# Generate cluster centers through the kmeans function
cluster_centers, distortion = ____
# Generate terms from the tfidf_vectorizer object
terms = tfidf_vectorizer.____()
for i in range(num_clusters):
# Sort the terms and print top 3 terms
center_terms = dict(zip(____, ____))
sorted_terms = sorted(____, key=center_terms.get, reverse=True)
print(____)