Get startedGet started for free

Top terms in movie clusters

Now that you have created a sparse matrix, generate cluster centers and print the top three terms in each cluster. Use the .todense() method to convert the sparse matrix, tfidf_matrix to a normal matrix for the kmeans() function to process. Then, use the .get_feature_names() method to get a list of terms in the tfidf_vectorizer object. The zip() function in Python joins two lists.

The tfidf_vectorizer object and sparse matrix, tfidf_matrix, from the previous have been retained in this exercise. kmeans has been imported from SciPy.

With a higher number of data points, the clusters formed would be defined more clearly. However, this requires some computational power, making it difficult to accomplish in an exercise here.

This exercise is part of the course

Cluster Analysis in Python

View Course

Exercise instructions

  • Generate cluster centers through the kmeans() function.
  • Generate a list of terms from the tfidf_vectorizer object.
  • Print top 3 terms of each cluster.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

num_clusters = 2

# Generate cluster centers through the kmeans function
cluster_centers, distortion = ____

# Generate terms from the tfidf_vectorizer object
terms = tfidf_vectorizer.____()

for i in range(num_clusters):
    # Sort the terms and print top 3 terms
    center_terms = dict(zip(____, ____))
    sorted_terms = sorted(____, key=center_terms.get, reverse=True)
    print(____)
Edit and Run Code