Clustering Wikipedia part II
It is now time to put your pipeline from the previous exercise to work! You are given an array articles of tf-idf word-frequencies of some popular Wikipedia articles, and a list titles of their titles. Use your pipeline to cluster the Wikipedia articles.
A solution to the previous exercise has been pre-loaded for you, so a Pipeline pipeline chaining TruncatedSVD with KMeans is available.
Bu egzersiz
Unsupervised Learning in Python
kursunun bir parçasıdırEgzersiz talimatları
- Import
pandasaspd. - Fit the pipeline to the word-frequency array
articles. - Predict the cluster labels.
- Align the cluster labels with the list
titlesof article titles by creating a DataFramedfwithlabelsandtitlesas columns. This has been done for you. - Use the
.sort_values()method ofdfto sort the DataFrame by the'label'column, and print the result. - Hit submit and take a moment to investigate your amazing clustering of Wikipedia pages!
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Import pandas
____
# Fit the pipeline to articles
____
# Calculate the cluster labels: labels
labels = ____
# Create a DataFrame aligning labels and titles: df
df = pd.DataFrame({'label': labels, 'article': titles})
# Display df sorted by cluster label
print(____)