Get startedGet started for free

The wonderful world of embeddings!

1. The wonderful world of embeddings!

Hi, welcome to the course! I'm Emmanuel, and I'll be your guide as we learn about embeddings and the powerful applications they unlock. Let's dive right in!

2. What are embeddings?

Embeddings are a fundamental concept in Natural Language Processing, or NLP, where text, which can be words, phrases, or entire documents are represented in numerical form.

3. What are embeddings?

More specifically, embedding models map text onto a multi-dimensional space, or vector space, and the numbers outputted by the model represent the location of the text in that space. More similar pieces of text or words, like teacher and student, are mapped closer together in the space and dissimilar words are mapped further away.

4. Why are embeddings useful?

This ability to map similar and dissimilar words means that embeddings models can be used to capture the semantic meaning of text. By semantic meaning, we mean that the full context and intent of the word is captured. For example, "Which way is it to the supermarket" and "Could I have directions to the shop" are semantically very similar, but only share two words in common. A model using embeddings would recognize the semantic similarity between the two variations, and return similar information. Let's take a look at some of the most powerful use cases being enabled by embedding models.

5. Semantic search engines

Embeddings enable the creation of semantic search engines. Traditional search engines use keyword pattern recognition to return relevant results; for example, searching "comfortable running shoes" on a traditional search engine may return results containing the keywords "comfortable", "running", and "shoes", but may miss the true intent behind the searcher's query. It will also miss variations on the words provided, like "soft" instead of "comfortable" and "sneakers" instead of "shoes".

6. Semantic search engines

Semantic search engines use embeddings to better understand the intent and context behind search queries to return more relevant results. The search query would be passed to an embedding model to generate the numbers

7. Semantic search engines

that map it onto the vector space, and the embedded results closest to it would be returned.

8. Recommendation systems

Embeddings also enable more sophisticated recommendation systems. For example, a job posting website could use embedding models to recommend jobs for its users based on the descriptions of jobs they have already viewed. Using a semantic recommendation system means that even if job titles vary, the system will recommend jobs with descriptions that are most semantically similar to the jobs that have been previously viewed.

9. Classification

The final use case we'll discuss is classification, which works very similarly to recommendation. This can be used to classify sentiment, cluster observations, or perform categorization all based on the semantic similarity between text. We could use embeddings to classify news headlines by embedding the headline and assigning the label closest to it.

10. Creating an Embeddings request

OpenAI provides access to their embedding models via the Embeddings endpoint, and requests to it take a very similar form to other OpenAI endpoints. We'll use the openai library to create requests to the OpenAI API, which also requires us to have an OpenAI API key. To begin our request, we first need to instantiate the OpenAI client and pass it our API key. To create the request, we call the create method on client.embeddings(). An OpenAI embedding model can be specified with the model argument, and the text to embed is passed to the input argument. We're specifying the input here as a text string, but the argument also accepts a list of strings. Finally, we'll call the .model_dump() method on the response to convert it into a dictionary, which is easier to work with, and print the result.

11. Embeddings response

The response from the API is extremely long, as the embedding model outputs 1536 numbers to represent the input string. Because we converted the response into a dictionary, we can dig into it using list and dictionary subsetting.

12. Extracting the embeddings

Here we can print the full list of 1536 numbers representing our text.

13. Let's practice!

We'll discuss these numbers in more detail next, but for now, let's practice!