t-SNE for 2-dimensional maps

1. t-SNE for 2-dimensional maps

In this video, you'll learn about an unsupervised learning method for visualization called "t-SNE".

2. t-SNE for 2-dimensional maps

t-SNE stands for "t-distributed stochastic neighbor embedding". It has a complicated name, but it serves a very simple purpose. It maps samples from their high-dimensional space into a 2- or 3-dimensional space so they can visualized. While some distortion is inevitable, t-SNE does a great job of approximately representing the distances between the samples. For this reason, t-SNE is an invaluable visual aid for understanding a dataset.

3. t-SNE on the iris dataset

To see what sorts of insights are possible with t-SNE, let's look at how it performs on the iris dataset. The iris samples are in a four dimensional space, where each dimension corresponds to one of the four iris measurements, such as petal length and petal width. Now t-SNE was given only the measurements of the iris samples. In particular it wasn't given any information about the three species of iris. But if we color the species differently on the scatter plot, we see that t-SNE has kept the species separate.

4. Interpreting t-SNE scatter plots

This scatter plot gives us a new insight, however. We learn that there are two iris species, versicolor and virginica, whose samples are close together in space. So it could happen that the iris dataset appears to have two clusters, instead of three. This is compatible with our previous examples using k-means, where we saw that a clustering with 2 clusters also had relatively low inertia, meaning tight clusters.

5. t-SNE in sklearn

t-SNE is available in scikit-learn, but it works a little differently to the fit/transform components you've already met. Let's see it in action on the iris dataset. The samples are in a 2-dimensional numpy array, and there is a list giving the species of each sample.

6. t-SNE in sklearn

To start with, import TSNE and create a TSNE object. Apply the fit_transform method to the samples, and then make a scatter plot of the result, coloring the points using the species. There are two aspects that deserve special attention: the fit_transform method, and the learning rate.

7. t-SNE has only fit_transform()

t-SNE only has a fit_transform method. As you might expect, the fit_transform method simultaneously fits the model and transforms the data. However, t-SNE does not have separate fit and transform methods. This means that you can't extend a t-SNE map to include new samples. Instead, you have to start over each time.

8. t-SNE learning rate

The second thing to notice is the learning rate. The learning rate makes the use of t-SNE more complicated than some other techniques. You may need to try different learning rates for different datasets. It is clear, however, when you've made a bad choice, because all the samples appear bunched together in the scatter plot. Normally it's enough to try a few values between 50 and 200.

9. Different every time

A final thing to be aware of is that the axes of a t-SNE plot do not have any interpretable meaning. In fact, they are different every time t-SNE is applied, even on the same data. For example, here are three t-SNE plots of the scaled Piedmont wine samples, generated using the same code. Note that while the orientation of the plot is different each time, the three wine varieties, represented here using colors, have the same position relative to one another.

10. Let's practice!

You are now equipped to use t-SNE to gain insight into some real-world datasets. Let's get some practice!

This exercise is part of the course

Unsupervised Learning in Python

IntermediateSkill Level

4.5+

Start Course for Free

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Exercise 1: Unsupervised Learning Exercise 2: How many clusters?Exercise 3: Clustering 2D points Exercise 4: Inspect your clustering Exercise 5: Evaluating a clustering Exercise 6: How many clusters of grain?Exercise 7: Evaluating the grain clustering Exercise 8: Transforming features for better clusterings Exercise 9: Scaling fish data for clustering Exercise 10: Clustering the fish data Exercise 11: Clustering stocks using KMeans Exercise 12: Which stocks move together?

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Exercise 1: Visualizing hierarchies Exercise 2: How many merges?Exercise 3: Hierarchical clustering of the grain data Exercise 4: Hierarchies of stocks Exercise 5: Cluster labels in hierarchical clustering Exercise 6: Which clusters are closest?Exercise 7: Different linkage, different hierarchical clustering!Exercise 8: Intermediate clusterings Exercise 9: Extracting the cluster labels Exercise 10: t-SNE for 2-dimensional maps

Current Exercise

Exercise 11: t-SNE visualization of grain dataset Exercise 12: A t-SNE map of the stock market

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Exercise 1: Visualizing the PCA transformation Exercise 2: Correlated data in nature Exercise 3: Decorrelating the grain measurements with PCA Exercise 4: Principal components Exercise 5: Intrinsic dimension Exercise 6: The first principal component Exercise 7: Variance of the PCA features Exercise 8: Intrinsic dimension of the fish data Exercise 9: Dimension reduction with PCA Exercise 10: Dimension reduction of the fish measurements Exercise 11: A tf-idf word-frequency array Exercise 12: Clustering Wikipedia part I Exercise 13: Clustering Wikipedia part II

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Exercise 1: Non-negative matrix factorization (NMF)Exercise 2: Non-negative data Exercise 3: NMF applied to Wikipedia articles Exercise 4: NMF features of the Wikipedia articles Exercise 5: NMF reconstructs samples Exercise 6: NMF learns interpretable parts Exercise 7: NMF learns topics of documents Exercise 8: Explore the LED digits dataset Exercise 9: NMF learns the parts of images Exercise 10: PCA doesn't learn parts Exercise 11: Building recommender systems using NMF Exercise 12: Which articles are similar to 'Cristiano Ronaldo'?Exercise 13: Recommend musical artists part I Exercise 14: Recommend musical artists part II Exercise 15: Final thoughts