Aan de slagGa gratis aan de slag

Splitting semantically

All of the splitting strategies you've used up to this point have the same drawback: the split doesn't consider the context of the surrounding text, so context can easily be lost during splitting.

In this exercise, you'll create and apply a semantic text splitter, which is a cutting-edge experimental method for splitting text based on semantic meaning. When the splitter detects that the meaning of the text has deviated past a certain threshold, a split will be performed.

Deze oefening maakt deel uit van de cursus

Retrieval Augmented Generation (RAG) with LangChain

Cursus bekijken

Oefeninstructies

  • Instantiate the 'text-embedding-3-small' embedding model from OpenAI.
  • Create a semantic text splitter that uses vector gradients to determine semantic similarity and uses 0.8 as the threshold at which to split.
  • Split the document using the semantic splitter.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Instantiate an OpenAI embeddings model
embedding_model = ____(api_key="", model='____')

# Create the semantic text splitter with desired parameters
semantic_splitter = ____(
    embeddings=____, breakpoint_threshold_type="____", breakpoint_threshold_amount=____
)

# Split the document
chunks = ____
print(chunks[0])
Code bewerken en uitvoeren