LoslegenKostenlos loslegen

Top terms in movie clusters

Now that you have created a sparse matrix, generate cluster centers and print the top three terms in each cluster. Use the .todense() method to convert the sparse matrix, tfidf_matrix to a normal matrix for the kmeans() function to process. Then, use the .get_feature_names() method to get a list of terms in the tfidf_vectorizer object. The zip() function in Python joins two lists.

The tfidf_vectorizer object and sparse matrix, tfidf_matrix, from the previous have been retained in this exercise. kmeans has been imported from SciPy.

With a higher number of data points, the clusters formed would be defined more clearly. However, this requires some computational power, making it difficult to accomplish in an exercise here.

Diese Übung ist Teil des Kurses

Cluster Analysis in Python

Kurs anzeigen

Anleitung zur Übung

  • Generate cluster centers through the kmeans() function.
  • Generate a list of terms from the tfidf_vectorizer object.
  • Print top 3 terms of each cluster.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

num_clusters = 2

# Generate cluster centers through the kmeans function
cluster_centers, distortion = ____

# Generate terms from the tfidf_vectorizer object
terms = tfidf_vectorizer.____()

for i in range(num_clusters):
    # Sort the terms and print top 3 terms
    center_terms = dict(zip(____, ____))
    sorted_terms = sorted(____, key=center_terms.get, reverse=True)
    print(____)
Code bearbeiten und ausführen