Sorting by similarity
Now that you've embedded all of your features, the next step is to compute the similarities. In this exercise, you'll define a function called find_n_closest(), which computes the cosine distances between a query vector and a list of embeddings and returns the n smallest distances and their indexes.
In the next exercise, you'll use this function to enable your semantic product search application.
distance has been imported from scipy.spatial.
This exercise is part of the course
Introduction to Embeddings with the OpenAI API
Exercise instructions
- Calculate the cosine distance between the
query_vectorandembedding. - Append a dictionary containing
distand itsindexto thedistanceslist. - Sort the
distanceslist by the'distance'key of each dictionary. - Return the first
nelements indistances_sorted.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def find_n_closest(query_vector, embeddings, n=3):
distances = []
for index, embedding in enumerate(embeddings):
# Calculate the cosine distance between the query vector and embedding
dist = ____
# Append the distance and index to distances
distances.append({"distance": ____, "index": ____})
# Sort distances by the distance key
distances_sorted = ____
# Return the first n elements in distances_sorted
return ____