Get startedGet started for free

Classifying review sentiment

Now that you've calculated the embeddings, it's time to compute the cosine distances and extract the most similar label.

You'll do this by defining a function called find_closest(), which can be used to compare the embeddings between one vector and multiple others, and return the nearest distance and its index. You'll then loop over the reviews and and use find_closest() to find the closest distance for each review, extracting the classified label using the index.

The class_embeddings and review_embeddings objects you created in the last exercise are available for you to use, as well as the reviews and sentiments.

This exercise is part of the course

Introduction to Embeddings with the OpenAI API

View Course

Exercise instructions

  • Define a function called find_closest() that returns the distance and index of the most similar embedding to the query_vector.
  • Use find_closest() to find the closest distance between each review's embeddings and the class_embeddings.
  • Use the 'index' of closest to subset sentiments and extract the 'label'.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define a function to return the minimum distance and its index
def find_closest(query_vector, embeddings):
  distances = []
  for index, embedding in enumerate(embeddings):
    dist = distance.cosine(____, ____)
    distances.append({"distance": dist, "index": index})
  return ____(distances, key=lambda x: x["distance"])

for index, review in enumerate(reviews):
  # Find the closest distance and its index using find_closest()
  closest = ____(review_embeddings[____], ____)
  # Subset sentiments using the index from closest
  label = ____
  print(f'"{review}" was classified as {label}')
Edit and Run Code