Get startedGet started for free

Visualizing and comparing word embeddings

Word embeddings are high-dimensional, making them hard to interpret directly. In this exercise, you'll project a few word vectors down to 2D using Principal Component Analysis (PCA) and visualize them. This helps reveal semantic groupings or similarities between words in the embedding space. Then, you will compare the embedding representations of two models: glove-wiki-gigaword-50 available through the variable model_glove_wiki, and glove-twitter-25 available through model_glove_twitter.

This exercise is part of the course

Natural Language Processing (NLP) in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

words = ["lion", "tiger", "leopard", "banana", "strawberry", "truck", "car", "bus"]

# Extract word embeddings
word_vectors = [____[____] for word in words]

# Reduce dimensions with PCA
pca = PCA(n_components=2)
word_vectors_2d = pca.____(____)

plt.scatter(word_vectors_2d[:, 0], word_vectors_2d[:, 1])
for word, (x, y) in zip(words, word_vectors_2d):
    plt.annotate(word, (x, y))
plt.title("GloVe Wikipedia Word Embeddings (2D PCA)")
plt.show()
Edit and Run Code