Classifying review sentiment
Now that you've calculated the embeddings, it's time to compute the cosine distances and extract the most similar label.
You'll do this by defining a function called find_closest()
, which can be used to compare the embeddings between one vector and multiple others, and return the nearest distance and its index. You'll then loop over the reviews and and use find_closest()
to find the closest distance for each review, extracting the classified label using the index.
The class_embeddings
and review_embeddings
objects you created in the last exercise are available for you to use, as well as the reviews
and sentiments
.
This exercise is part of the course
Introduction to Embeddings with the OpenAI API
Exercise instructions
- Define a function called
find_closest()
that returns the distance and index of the most similar embedding to thequery_vector
. - Use
find_closest()
to find the closest distance between each review's embeddings and theclass_embeddings
. - Use the
'index'
ofclosest
to subsetsentiments
and extract the'label'
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define a function to return the minimum distance and its index
def find_closest(query_vector, embeddings):
distances = []
for index, embedding in enumerate(embeddings):
dist = distance.cosine(____, ____)
distances.append({"distance": dist, "index": index})
return ____(distances, key=lambda x: x["distance"])
for index, review in enumerate(reviews):
# Find the closest distance and its index using find_closest()
closest = ____(review_embeddings[____], ____)
# Subset sentiments using the index from closest
label = ____
print(f'"{review}" was classified as {label}')