Exercise

Elon's tweets

In this exercise, you will attempt the impossible: detecting patterns in Elon Musk's tweets!

You will apply two unsupervised learning algorithms:

  • Dimensionality reduction, to translate your text data into a 2D space.
  • Clustering, to find groups of similar tweets.

The go-to model for dimensionality reduction is Principal Component Analysis (PCA), while the KMeans algorithm represents the same in the domain of clustering.

  • Tweets in their raw form were loaded into the variable named tweets_raw.
  • They have also been translated into a machine-digestible, vectorized form, contained in the variable tweets_matrix.
  • To write less code, we want you to use the functions for combined fitting and transformation/prediction - .fit_transform() and .fit_predict()

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

Instructions

100 XP
  • Set the algorithm for dimensionality reduction and the number of dimensions to 2.
  • Apply dimensionality reduction to the vectorized dataset tweets_matrix.
  • Configure the clustering model to find two clusters in the input data.
  • Find clusters within the reduced dataset and display the results. This latter step has already been done for you.