Session Ready
Exercise

Perplexity of bigger MNIST dataset

Now, let's investigate the effect of the perplexity values with a bigger MNIST dataset of 10.000 records.

It would take a lot of time to execute t-SNE for this many records on the DataCamp platform. This is why the pre-loaded output of two t-SNE embeddings with perplexity values of 5 and 50, named tsne_output_5 and tsne_output_50 are available in the workspace.

We will look at the K-L costs and plot them using the digit label from the mnist_10k dataset, which is also available in the environment.

The Rtsne and ggplot2 packages have been loaded.

Instructions
100 XP
  • Inspect the obtained K-L divergence costs with perplexity 5 and 50, stored in tsne_output_5 and tsne_output_50.
  • Create two data frames named tsne_plot_5 and tsne_plot_50 with the first two coordinates of the embedding and the label from mnist_10k dataset.
  • Plot the obtained embeddings with perplexity values of 5 and 50 using ggplot(). Give the points a label and color based on the value of the digit.