1. Learn
  2. /
  3. Courses
  4. /
  5. Practicing Machine Learning Interview Questions in Python

Connected

Exercise

Visualization PCs with a scree plot

In a machine learning interview, you may be asked what is the optimum number of features to keep. In this exercise you'll create a scree plot and a cumulative explained variance ratio plot of the principal components using PCA on loan_data. This will help inform the optimal number of PCs for training a more accurate ML model going forward.

Since PCA is an unsupervised method, that means principal component analysis is performed on the X matrix having removed the target variable Loan Status from the dataset. Not setting n_components returns all the principal components from the trained model.

Instructions 1/4

undefined XP
  • 1
    • Create a data matrix X, removing the target variable.
    • Instantiate, fit and transform a PCA object that returns 10 PCs.
  • 2
    • Create a DataFrame mapping Variance Explained to the explained variance ratio.
    • Create a scree plot from pca_df setting your PCs on the x-axis and explained variance on the y-axis.
  • 3
    • Instantiate, fit and transform a PCA object not setting n_components.
    • Print the variance explained ratio.
  • 4
    • Assign the cumulative sum of the explained ratios from the previous step to cumulative_var.
    • Plot the results.