Exercise

Variance explained

In this exercise, you will produce scree plots showing the proportion of variance explained as the number of principal components increases. The data from PCA must be prepared for these plots, as there is not a built-in function in R to create them directly from the PCA model.

As you look at these plots, ask yourself if there's an elbow in the amount of variance explained that might lead you to pick a natural number of principal components. If an obvious elbow does not exist, as is typical in real-world datasets, consider how else you might determine the number of principal components to retain based on the scree plot.

Instructions

100 XP

The variables you created before, wisc.data, diagnosis, and wisc.pr, are still available.

  • Calculate the variance of each principal component by squaring the sdev component of wisc.pr. Save the result as an object called pr.var.
  • Calculate the variance explained by each principal component by dividing by the total variance explained of all principal components. Assign this to a variable called pve.
  • Create a plot of variance explained for each principal component.
  • Using the cumsum() function, create a plot of cumulative proportion of variance explained.