Visualizing and comparing word embeddings
Word embeddings are high-dimensional, making them hard to interpret directly. In this exercise, you'll project a few word vectors down to 2D using Principal Component Analysis (PCA) and visualize them. This helps reveal semantic groupings or similarities between words in the embedding space. Then, you will compare the embedding representations of two models: glove-wiki-gigaword-50 available through the variable model_glove_wiki, and glove-twitter-25 available through model_glove_twitter.
Deze oefening maakt deel uit van de cursus
Natural Language Processing (NLP) in Python
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
words = ["lion", "tiger", "leopard", "banana", "strawberry", "truck", "car", "bus"]
# Extract word embeddings
word_vectors = [____[____] for word in words]
# Reduce dimensions with PCA
pca = PCA(n_components=2)
word_vectors_2d = pca.____(____)
plt.scatter(word_vectors_2d[:, 0], word_vectors_2d[:, 1])
for word, (x, y) in zip(words, word_vectors_2d):
plt.annotate(word, (x, y))
plt.title("GloVe Wikipedia Word Embeddings (2D PCA)")
plt.show()