BaşlayınÜcretsiz Başlayın

Clustering Wikipedia part II

It is now time to put your pipeline from the previous exercise to work! You are given an array articles of tf-idf word-frequencies of some popular Wikipedia articles, and a list titles of their titles. Use your pipeline to cluster the Wikipedia articles.

A solution to the previous exercise has been pre-loaded for you, so a Pipeline pipeline chaining TruncatedSVD with KMeans is available.

Bu egzersiz

Unsupervised Learning in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Import pandas as pd.
  • Fit the pipeline to the word-frequency array articles.
  • Predict the cluster labels.
  • Align the cluster labels with the list titles of article titles by creating a DataFrame df with labels and titles as columns. This has been done for you.
  • Use the .sort_values() method of df to sort the DataFrame by the 'label' column, and print the result.
  • Hit submit and take a moment to investigate your amazing clustering of Wikipedia pages!

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Import pandas
____

# Fit the pipeline to articles
____

# Calculate the cluster labels: labels
labels = ____

# Create a DataFrame aligning labels and titles: df
df = pd.DataFrame({'label': labels, 'article': titles})

# Display df sorted by cluster label
print(____)
Kodu Düzenle ve Çalıştır