Aktien mit KMeans clustern

In dieser Übung clusterst du Unternehmen anhand ihrer täglichen Aktienkursbewegungen (also der Differenz in US‑Dollar zwischen Schluss- und Eröffnungskurs für jeden Handelstag). Du erhältst ein NumPy-Array namens movements mit täglichen Kursbewegungen von 2010 bis 2015 (aus Yahoo! Finance). Jede Zeile entspricht einem Unternehmen, jede Spalte einem Handelstag.

Einige Aktien sind teurer als andere. Um das zu berücksichtigen, füge am Anfang deiner Pipeline einen Normalizer hinzu. Der Normalizer skaliert die Aktienkurse jedes Unternehmens vor dem Clustering separat auf eine relative Skala.

Beachte, dass Normalizer() sich von StandardScaler() unterscheidet, den du in der vorherigen Übung verwendet hast. Während StandardScaler() Merkmale (wie die Merkmale der Fischdaten aus der vorherigen Übung) standardisiert, indem er den Mittelwert entfernt und auf die Varianz 1 skaliert, skaliert Normalizer() jede Probe – hier die Aktienkurse eines einzelnen Unternehmens – unabhängig von den anderen.

KMeans und make_pipeline wurden bereits für dich importiert.

Diese Übung ist Teil des Kurses

<Kurs>Unsupervised Learning in Python</Kurs>

Übungsanweisungen

Importiere Normalizer aus sklearn.preprocessing.
Erstelle eine Instanz von Normalizer namens normalizer.
Erstelle eine Instanz von KMeans namens kmeans mit 10 Clustern.
Erzeuge mit make_pipeline() eine Pipeline namens pipeline, die normalizer und kmeans verknüpft.
Fitte die Pipeline auf das Array movements.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import Normalizer
____

# Create a normalizer: normalizer
normalizer = ____

# Create a KMeans model with 10 clusters: kmeans
kmeans = ____

# Make a pipeline chaining normalizer and kmeans: pipeline
pipeline = ____

# Fit pipeline to the daily price movements
____

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

<Kurs>Unsupervised Learning in Python</Kurs>

Mittlere SchwierigkeitSchwierigkeitsgrad

4.8+

Kurs kostenlos starten

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Exercise 1: Unüberwachtes Lernen Exercise 2: Wie viele Cluster?Exercise 3: Clustering von zweidimensionalen Punkten Exercise 4: Untersuche dein Clustering Exercise 5: Bewertung eines Clustering Exercise 6: Wie viele Getreide-Cluster?Exercise 7: Bewertung des Getreide-Clusterings Exercise 8: Merkmale transformieren für bessere Clusterings Exercise 9: Fischdaten fürs Clustering skalieren Exercise 10: Fischdaten clustern Exercise 11: Aktien mit KMeans clustern

Aktuelle Übung

Exercise 12: Welche Aktien bewegen sich gemeinsam?

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Exercise 1: Visualizing hierarchies Exercise 2: How many merges?Exercise 3: Hierarchical clustering of the grain data Exercise 4: Hierarchies of stocks Exercise 5: Cluster labels in hierarchical clustering Exercise 6: Which clusters are closest?Exercise 7: Different linkage, different hierarchical clustering!Exercise 8: Intermediate clusterings Exercise 9: Extracting the cluster labels Exercise 10: t-SNE for 2-dimensional maps Exercise 11: t-SNE visualization of grain dataset Exercise 12: A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Exercise 1: Visualizing the PCA transformation Exercise 2: Correlated data in nature Exercise 3: Decorrelating the grain measurements with PCA Exercise 4: Principal components Exercise 5: Intrinsic dimension Exercise 6: The first principal component Exercise 7: Variance of the PCA features Exercise 8: Intrinsic dimension of the fish data Exercise 9: Dimension reduction with PCA Exercise 10: Dimension reduction of the fish measurements Exercise 11: A tf-idf word-frequency array Exercise 12: Clustering Wikipedia part I Exercise 13: Clustering Wikipedia part II

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Exercise 1: Non-negative matrix factorization (NMF)Exercise 2: Non-negative data Exercise 3: NMF applied to Wikipedia articles Exercise 4: NMF features of the Wikipedia articles Exercise 5: NMF reconstructs samples Exercise 6: NMF learns interpretable parts Exercise 7: NMF learns topics of documents Exercise 8: Explore the LED digits dataset Exercise 9: NMF learns the parts of images Exercise 10: PCA doesn't learn parts Exercise 11: Building recommender systems using NMF Exercise 12: Which articles are similar to 'Cristiano Ronaldo'?Exercise 13: Recommend musical artists part I Exercise 14: Recommend musical artists part II Exercise 15: Final thoughts