Visualization PCs with a scree plot
In a machine learning interview, you may be asked what is the optimum number of features to keep. In this exercise you'll create a scree plot and a cumulative explained variance ratio plot of the principal components using PCA on loan_data
.
This will help inform the optimal number of PCs for training a more accurate ML model going forward.
Since PCA is an unsupervised method, that means principal component analysis is performed on the X
matrix having removed the target variable Loan Status
from the dataset. Not setting n_components
returns all the principal components from the trained model.
This exercise is part of the course
Practicing Machine Learning Interview Questions in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Remove target variable
X = loan_data.____('____', axis=1)
# Instantiate
pca = ____(n_components=____)
# Fit and transform
principalComponents = pca.____(____)