Classifying review sentiment

Now that you've calculated the embeddings, it's time to compute the cosine distances and extract the most similar label.

You'll do this by defining a function called find_closest(), which can be used to compare the embeddings between one vector and multiple others, and return the nearest distance and its index. You'll then loop over the reviews and and use find_closest() to find the closest distance for each review, extracting the classified label using the index.

The class_embeddings and review_embeddings objects you created in the last exercise are available for you to use, as well as the reviews and sentiments.

Deze oefening maakt deel uit van de cursus

Introduction to Embeddings with the OpenAI API

Cursus bekijken

Oefeninstructies

Define a function called find_closest() that returns the distance and index of the most similar embedding to the query_vector.
Use find_closest() to find the closest distance between each review's embeddings and the class_embeddings.
Use the 'index' of closest to subset sentiments and extract the 'label'.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Define a function to return the minimum distance and its index
def find_closest(query_vector, embeddings):
  distances = []
  for index, embedding in enumerate(embeddings):
    dist = distance.cosine(____, ____)
    distances.append({"distance": dist, "index": index})
  return ____(distances, key=lambda x: x["distance"])

for index, review in enumerate(reviews):
  # Find the closest distance and its index using find_closest()
  closest = ____(review_embeddings[____], ____)
  # Subset sentiments using the index from closest
  label = ____
  print(f'"{review}" was classified as {label}')

Code bewerken en uitvoeren