Get startedGet started for free

t-Distributed Stochastic Neighborhood Embedding (t-SNE)

1. t-Distributed Stochastic Neighborhood Embedding (t-SNE)

Welcome back. Now that we've gained an intuition for PCA, let's explore a more advanced method of feature extraction — t-Distributed Stochastic Neighborhood Embedding, or t-SNE.

2. t-SNE vs PCA

t-SNE differs from PCA in a number of ways. Here are a few. First, t-SNE is non-linear. This means it can capture more complex relationships compared to PCA, which is linear.

3. t-SNE vs PCA

t-SNE is non-deterministic, meaning it is a random algorithm. For this reason, we'll often set the random seed generator to make examples reproducible.

4. t-SNE vs PCA

t-SNE handles outliers better than PCA.

5. t-SNE vs PCA

t-SNE is also much more computationally expensive than PCA.

6. t-SNE vs PCA

Lastly, t-SNE has several hyperparameters that can be adjusted to maximize the fit.

7. Plotting PCA and t-SNE

Let's continue to use the employee attrition data to compare PCA and t-SNE. On the left is PCA. On the right is t-SNE. These plots highlight how t-SNE preserves the local structure of the data, while PCA preserves the global structure of the data. In other words, t-SNE attempts to keep neighboring data points next to each other as it embeds them into a lower-dimensional space. However, we see that t-SNE does not do any better than PCA at separating employees that left and stayed. Tuning t-SNEs hyperparameters could result in some improvements, but tuning t-SNE is beyond the scope of this course.

8. t-SNE hyperparameters

We'll simply mention the main hyperparameters that can be tuned. The perplexity determines the number of nearest neighbors the algorithm considers. The learning rate controls the rate at which the neural network weights are adjusted. Lastly, we can control the number of iterations the algorithm will take to train the neural network — more iterations fine tunes the model further, too many iterations can result in overfitting.

9. t-SNE in R

Now let's see how we implement t-SNE in R. First we load the Rtsne library. For demonstration purposes, we set the random number generator seed to make the example reproducible. Then we call Rtsne() and pass it the data frame minus the target variable. We accept the default hyperparameter values. We store the results in tsne. The algorithm stores the two-dimensional coordinates in the Y object of tsne. So we use bind_cols() to add them to the original data frame and store the new data frame in tsne_df. Notice that we named the x and y tsne coordinate columns as tsne_x and tsne_y, respectively. Lastly, we pass tsne_df to ggplot() to create a scatterplot using geom_point().

10. t-SNE plot

That code produces this t-SNE plot. The x and y axes are the features extracted by t-SNE. Remember that a major goal of dimensionality reduction is to maintain information about the target variable while eliminating uninformative dimensions. We can see that those two features do not separate the attrition observations very well. In other words, they do not help explain why some employees left and some stayed.

11. Let's practice!

Now it's your turn to explore t-SNE.