Estimating embedding costs with tiktoken

Now that we've created a database and collection to store the Netflix films and TV shows, we can begin embedding data.

Before embedding a large dataset, it's important to do a cost estimate to ensure you don't go over any budget restraints. Because OpenAI models are priced by number of tokens inputted, we'll use OpenAI's tiktoken library to count the number of tokens and convert them into a dollar cost.

You've been provided with documents, which is a list containing all of the data to embed. You'll iterate over the list, encode each document, and count the total number of tokens. Finally, you'll use the model's pricing to convert this into a cost.

Questo esercizio fa parte del corso

Introduction to Embeddings with the OpenAI API

Visualizza il corso

Istruzioni dell'esercizio

Load the encoder for the text-embedding-3-small model.
Encode each text in documents, and sum the result to find the total number of tokens in the dataset, total_tokens.
Print the total number of tokens and the cost of those tokens using the model's cost_per_1k_tokens defined for you.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Load the encoder for the OpenAI text-embedding-3-small model
enc = tiktoken.encoding_for_model("____")

# Encode each text in documents and calculate the total tokens
total_tokens = ____(____(____) for ____ in documents)

cost_per_1k_tokens = 0.00002

# Display number of tokens and cost
print('Total tokens:', ____)
print('Cost:', ____)

Modifica ed esegui il codice