Session Ready
Exercise

Visualization PCs with a scree plot

In a machine learning interview, you may be asked what is the optimum number of features to keep. In this exercise you'll create a scree plot and a cumulative explained variance ratio plot of the principal components using PCA on loan_data. This will help inform the optimal number of PCs for training a more accurate ML model going forward.

Since PCA is an unsupervised method, that means principal component analysis is performed on the X matrix having removed the target variable Loan Status from the dataset. Not setting n_components returns all the principal components from the trained model.

Instructions 1/4
undefined XP
  • 1
    • Create a data matrix X, removing the target variable.
    • Instantiate, fit and transform a PCA object that returns 10 PCs.
    • 2
      • Create a DataFrame mapping Variance Explained to the explained variance ratio.
      • Create a scree plot from pca_df setting your PCs on the x-axis and explained variance on the y-axis.
    • 3
      • Instantiate, fit and transform a PCA object not setting n_components.
      • Print the variance explained ratio.
    • 4
      • Assign the cumulative sum of the explained ratios from the previous step to cumulative_var.
      • Plot the results.