Exercise

# Elon's tweets

In this exercise, you will attempt the impossible: detecting patterns in Elon Musk's tweets!

You will apply two unsupervised learning algorithms:

- Dimensionality reduction, to translate your text data into a 2D space.
- Clustering, to find groups of similar tweets.

The go-to model for dimensionality reduction is `Principal Component Analysis`

(PCA), while the `KMeans`

algorithm represents the same in the domain of clustering.

- Tweets in their raw form were loaded into the variable named
`tweets_raw`

. - They have also been translated into a machine-digestible, vectorized form, contained in the variable
`tweets_matrix`

. - To write less code, we want you to use the functions for
*combined*fitting and transformation/prediction -`.fit_transform()`

and`.fit_predict()`

*Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).*

Instructions

**100 XP**

- Set the algorithm for dimensionality reduction and the number of dimensions to 2.
- Apply dimensionality reduction to the vectorized dataset
`tweets_matrix`

. - Configure the clustering model to find two clusters in the input data.
- Find clusters within the reduced dataset and display the results. This latter step has already been done for you.