Exercise

Computing similarities of digits 1 and 0

One way to measure the label similarity for each digit is by computing the Euclidean distance in the lower dimensional space obtained from the t-SNE algorithm. You need to use the previously calculated centroids stored in dt_prototypes and compute the Euclidean distance to the centroid of digit 1 for the last 5000 records from tsne and mnist_10k datasets that are labeled either as 1 or 0.

Note that the last 5000 records of tsne were not used before.

The MNIST data mnist_10k and t-SNE output tsne are available in the workspace. The data.table package has been loaded for you.

Instructions

100 XP
  • Get the last 5000 records (5001 to 10000) from the t-SNE output and store the result in the distances data.table. Set the column names to "X" and "Y".
  • Use the true label of those 5000 records from mnist_10k to create the label column in distances.
  • Filter only those labels that are actually 1 or 0.
  • Compute the Euclidean distance of all the records that are actually 1 or 0 to the centroid of digit 1 in a column named dist1.