Sorting by similarity
Now that you've embedded all of your features, the next step is to compute the similarities. In this exercise, you'll define a function called find_n_closest()
, which computes the cosine distances between a query vector and a list of embeddings and returns the n
smallest distances and their indexes.
In the next exercise, you'll use this function to enable your semantic product search application.
distance
has been imported from scipy.spatial
.
This exercise is part of the course
Introduction to Embeddings with the OpenAI API
Exercise instructions
- Calculate the cosine distance between the
query_vector
andembedding
. - Append a dictionary containing
dist
and itsindex
to thedistances
list. - Sort the
distances
list by the'distance'
key of each dictionary. - Return the first
n
elements indistances_sorted
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def find_n_closest(query_vector, embeddings, n=3):
distances = []
for index, embedding in enumerate(embeddings):
# Calculate the cosine distance between the query vector and embedding
dist = ____
# Append the distance and index to distances
distances.append({"distance": ____, "index": ____})
# Sort distances by the distance key
distances_sorted = ____
# Return the first n elements in distances_sorted
return ____