The first principal component
The first principal component of the data is the direction in which the data varies the most. In this exercise, your job is to use PCA to find the first principal component of the length and width measurements of the grain samples, and represent it as an arrow on the scatter plot.
The array grains
gives the length and width of the grain samples. PyPlot (plt
) and PCA
have already been imported for you.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Make a scatter plot of the grain measurements. This has been done for you.
- Create a
PCA
instance calledmodel
. - Fit the model to the
grains
data. - Extract the coordinates of the mean of the data using the
.mean_
attribute ofmodel
. - Get the first principal component of
model
using the.components_[0,:]
attribute. - Plot the first principal component as an arrow on the scatter plot, using the
plt.arrow()
function. You have to specify the first two arguments -mean[0]
andmean[1]
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Make a scatter plot of the untransformed points
plt.scatter(grains[:,0], grains[:,1])
# Create a PCA instance: model
model = ____
# Fit model to points
____
# Get the mean of the grain samples: mean
mean = ____
# Get the first principal component: first_pc
first_pc = ____
# Plot first_pc as an arrow, starting at mean
plt.arrow(____, ____, first_pc[0], first_pc[1], color='red', width=0.01)
# Keep axes on same scale
plt.axis('equal')
plt.show()