Exercise

Generating PCA from MNIST sample

You are going to compute a PCA with the previous mnist_sample dataset. The goal is to have a good representation of each digit in a lower dimensional space. PCA will give you a set of variables, named principal components, that are a linear combination of the input variables. These principal components are ordered in terms of the variance they capture from the original data. So, if you plot the first two principal components you can see the digits in a 2-dimensional space.

A sample of 200 records of the MNIST dataset named mnist_sample is loaded for you.

Instructions

100 XP
  • Compute PCA using the prcomp() function with default parameters on the features of mnist_sample.
  • Observe the results using the summary() function.
  • Store the first two coordinates of the PCA output and the label in a data frame.
  • Plot the first two principal components using ggplot() and color the data based on the digit label. Please note you need to use column names in the aesthetics of ggplot2.