Variance of the PCA features
The fish dataset is 6-dimensional. But what is its intrinsic dimension? Make a plot of the variances of the PCA features to find out. As before, samples is a 2D array, where each row represents a fish. You'll need to standardize the features first.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Create an instance of
StandardScalercalledscaler. - Create a
PCAinstance calledpca. - Use the
make_pipeline()function to create a pipeline chainingscalerandpca. - Use the
.fit()method ofpipelineto fit it to the fish samplessamples. - Extract the number of components used using the
.n_components_attribute ofpca. Place this inside arange()function and store the result asfeatures. - Use the
plt.bar()function to plot the explained variances, withfeatureson the x-axis andpca.explained_variance_on the y-axis.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform the necessary imports
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt
# Create scaler: scaler
scaler = ____
# Create a PCA instance: pca
pca = ____
# Create pipeline: pipeline
pipeline = ____
# Fit the pipeline to 'samples'
____
# Plot the explained variances
features = ____
plt.bar(____, ____)
plt.xlabel('PCA feature')
plt.ylabel('variance')
plt.xticks(features)
plt.show()