MulaiMulai sekarang secara gratis

Top terms in movie clusters

Now that you have created a sparse matrix, generate cluster centers and print the top three terms in each cluster. Use the .todense() method to convert the sparse matrix, tfidf_matrix to a normal matrix for the kmeans() function to process. Then, use the .get_feature_names() method to get a list of terms in the tfidf_vectorizer object. The zip() function in Python joins two lists.

The tfidf_vectorizer object and sparse matrix, tfidf_matrix, from the previous have been retained in this exercise. kmeans has been imported from SciPy.

With a higher number of data points, the clusters formed would be defined more clearly. However, this requires some computational power, making it difficult to accomplish in an exercise here.

Latihan ini adalah bagian dari kursus

Cluster Analysis in Python

Lihat Kursus

Petunjuk latihan

  • Generate cluster centers through the kmeans() function.
  • Generate a list of terms from the tfidf_vectorizer object.
  • Print top 3 terms of each cluster.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

num_clusters = 2

# Generate cluster centers through the kmeans function
cluster_centers, distortion = ____

# Generate terms from the tfidf_vectorizer object
terms = tfidf_vectorizer.____()

for i in range(num_clusters):
    # Sort the terms and print top 3 terms
    center_terms = dict(zip(____, ____))
    sorted_terms = sorted(____, key=center_terms.get, reverse=True)
    print(____)
Edit dan Jalankan Kode