Adding data to the collection
Time to add those Netflix films and TV shows to your collection! You've been provided with a list of document IDs and texts, stored in ids
and documents
, respectively, which have been extracted from netflix_titles.csv
using the following code:
ids = []
documents = []
with open('netflix_titles.csv') as csvfile:
reader = csv.DictReader(csvfile)
for i, row in enumerate(reader):
ids.append(row['show_id'])
text = f"Title: {row['title']} ({row['type']})\nDescription: {row['description']}\nCategories: {row['listed_in']}"
documents.append(text)
As an example of what information will be embedded, here's the first document from documents
:
Title: Dick Johnson Is Dead (Movie)
Description: As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.
Categories: Documentaries
All of the necessary functions and packages have been imported, and a persistent client has been created and assigned to client
.
This exercise is part of the course
Introduction to Embeddings with the OpenAI API
Exercise instructions
- Recreate your
netflix_titles
collection. - Add the documents and their IDs to the collection.
- Print the number of documents in
collection
and the first ten items.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Recreate the netflix_titles collection
collection = client.____(
name="netflix_titles",
embedding_function=OpenAIEmbeddingFunction(model_name="text-embedding-3-small", api_key="")
)
# Add the documents and IDs to the collection
____
# Print the collection size and first ten items
print(f"No. of documents: {____}")
print(f"First ten documents: {____}")