Visualizing and comparing word embeddings
Word embeddings are high-dimensional, making them hard to interpret directly. In this exercise, you'll project a few word vectors down to 2D using Principal Component Analysis (PCA) and visualize them. This helps reveal semantic groupings or similarities between words in the embedding space. Then, you will compare the embedding representations of two models: glove-wiki-gigaword-50
available through the variable model_glove_wiki,
and glove-twitter-25
available through model_glove_twitter.
This exercise is part of the course
Natural Language Processing (NLP) in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
words = ["lion", "tiger", "leopard", "banana", "strawberry", "truck", "car", "bus"]
# Extract word embeddings
word_vectors = [____[____] for word in words]
# Reduce dimensions with PCA
pca = PCA(n_components=2)
word_vectors_2d = pca.____(____)
plt.scatter(word_vectors_2d[:, 0], word_vectors_2d[:, 1])
for word, (x, y) in zip(words, word_vectors_2d):
plt.annotate(word, (x, y))
plt.title("GloVe Wikipedia Word Embeddings (2D PCA)")
plt.show()