1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Embeddings with the OpenAI API

Connected

Exercise

Estimating embedding costs with tiktoken

Now that we've created a database and collection to store the Netflix films and TV shows, we can begin embedding data.

Before embedding a large dataset, it's important to do a cost estimate to ensure you don't go over any budget restraints. Because OpenAI models are priced by number of tokens inputted, we'll use OpenAI's tiktoken library to count the number of tokens and convert them into a dollar cost.

You've been provided with documents, which is a list containing all of the data to embed. You'll iterate over the list, encode each document, and count the total number of tokens. Finally, you'll use the model's pricing to convert this into a cost.

Instructions

100 XP
  • Load the encoder for the text-embedding-3-small model.
  • Encode each text in documents, and sum the result to find the total number of tokens in the dataset, total_tokens.
  • Print the total number of tokens and the cost of those tokens using the model's cost_per_1k_tokens defined for you.