Wikipedia'yı kümeleme I

Videoda TruncatedSVD'nin, kelime sıklığı dizileri gibi csr_matrix biçimindeki seyrek diziler üzerinde PCA yapabildiğini gördün. TruncatedSVD ve k-means bilgisini birleştirerek Wikipedia'daki popüler sayfaları kümelendir. Bu egzersizde, boru hattını oluşturacaksın. Sonraki egzersizde ise bunu bazı Wikipedia makalelerinin kelime sıklığı dizisine uygulayacaksın.

TruncatedSVD'nin ardından KMeans'ten oluşan bir Pipeline nesnesi oluştur. (Bu kez, kelime sıklığı matrisi senin için önceden hesaplandı; bu yüzden TfidfVectorizer'a gerek yok.)

Çalışacağın Wikipedia veri kümesi buradan alınmıştır.

Bu egzersiz

Python'da Unsupervised Learning

kursunun bir parçasıdır

Kursu Görüntüle

Egzersiz talimatları

Şunları içe aktar:
- sklearn.decomposition içinden TruncatedSVD.
- sklearn.cluster içinden KMeans.
- sklearn.pipeline içinden make_pipeline.
n_components=50 ile svd adlı bir TruncatedSVD örneği oluştur.
n_clusters=6 ile kmeans adlı bir KMeans örneği oluştur.
svd ve kmeans'ten oluşan pipeline adlı bir boru hattı oluştur.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Perform the necessary imports
from ____ import ____
from ____ import ____
from ____ import ____

# Create a TruncatedSVD instance: svd
svd = ____

# Create a KMeans instance: kmeans
kmeans = ____

# Create a pipeline: pipeline
pipeline = ____

Kodu Düzenle ve Çalıştır

Bu egzersiz

Python'da Unsupervised Learning

kursunun bir parçasıdır

IntermediárioNível de habilidade

4.8+

Kursa Ücretsiz Başlayın

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Exercise 1: Unsupervised Learning Exercise 2: How many clusters?Exercise 3: Clustering 2D points Exercise 4: Inspect your clustering Exercise 5: Evaluating a clustering Exercise 6: How many clusters of grain?Exercise 7: Evaluating the grain clustering Exercise 8: Transforming features for better clusterings Exercise 9: Scaling fish data for clustering Exercise 10: Clustering the fish data Exercise 11: Clustering stocks using KMeans Exercise 12: Which stocks move together?

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Exercise 1: Visualizing hierarchies Exercise 2: How many merges?Exercise 3: Hierarchical clustering of the grain data Exercise 4: Hierarchies of stocks Exercise 5: Cluster labels in hierarchical clustering Exercise 6: Which clusters are closest?Exercise 7: Different linkage, different hierarchical clustering!Exercise 8: Intermediate clusterings Exercise 9: Extracting the cluster labels Exercise 10: t-SNE for 2-dimensional maps Exercise 11: t-SNE visualization of grain dataset Exercise 12: A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Exercise 1: PCA dönüşümünü görselleştirme Exercise 2: Doğada ilişkili veriler Exercise 3: Tahıl ölçümlerini PCA ile ilişkisizleştirme Exercise 4: Temel bileşenler Exercise 5: İçsel boyut Exercise 6: İlk temel bileşen Exercise 7: PCA özelliklerinin varyansı Exercise 8: Balık verisinin içsel boyutu Exercise 9: PCA ile boyut indirgeme Exercise 10: Balık ölçümlerinde boyut indirgeme Exercise 11: Bir tf-idf sözcük sıklığı dizisi Exercise 12: Wikipedia'yı kümeleme I

Geçerli Egzersiz

Exercise 13: Wikipedia kümeleme bölüm II

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Exercise 1: Non-negative matrix factorization (NMF)Exercise 2: Non-negative data Exercise 3: NMF applied to Wikipedia articles Exercise 4: NMF features of the Wikipedia articles Exercise 5: NMF reconstructs samples Exercise 6: NMF learns interpretable parts Exercise 7: NMF learns topics of documents Exercise 8: Explore the LED digits dataset Exercise 9: NMF learns the parts of images Exercise 10: PCA doesn't learn parts Exercise 11: Building recommender systems using NMF Exercise 12: Which articles are similar to 'Cristiano Ronaldo'?Exercise 13: Recommend musical artists part I Exercise 14: Recommend musical artists part II Exercise 15: Final thoughts