Clustering stocks using KMeans
In this exercise, you'll cluster companies using their daily stock price movements (i.e. the dollar difference between the closing and opening prices for each trading day). You are given a NumPy array movements
of daily price movements from 2010 to 2015 (obtained from Yahoo! Finance), where each row corresponds to a company, and each column corresponds to a trading day.
Some stocks are more expensive than others. To account for this, include a Normalizer
at the beginning of your pipeline. The Normalizer will separately transform each company's stock price to a relative scale before the clustering begins.
Note that Normalizer()
is different to StandardScaler()
, which you used in the previous exercise. While StandardScaler()
standardizes features (such as the features of the fish data from the previous exercise) by removing the mean and scaling to unit variance, Normalizer()
rescales each sample - here, each company's stock price - independently of the other.
KMeans
and make_pipeline
have already been imported for you.
This exercise is part of the course
Unsupervised Learning in Python
Exercise instructions
- Import
Normalizer
fromsklearn.preprocessing
. - Create an instance of
Normalizer
callednormalizer
. - Create an instance of
KMeans
calledkmeans
with10
clusters. - Using
make_pipeline()
, create a pipeline calledpipeline
that chainsnormalizer
andkmeans
. - Fit the pipeline to the
movements
array.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import Normalizer
____
# Create a normalizer: normalizer
normalizer = ____
# Create a KMeans model with 10 clusters: kmeans
kmeans = ____
# Make a pipeline chaining normalizer and kmeans: pipeline
pipeline = ____
# Fit pipeline to the daily price movements
____