CommencerCommencer gratuitement

Splitting semantically

All of the splitting strategies you've used up to this point have the same drawback: the split doesn't consider the context of the surrounding text, so context can easily be lost during splitting.

In this exercise, you'll create and apply a semantic text splitter, which is a cutting-edge experimental method for splitting text based on semantic meaning. When the splitter detects that the meaning of the text has deviated past a certain threshold, a split will be performed.

Cet exercice fait partie du cours

Retrieval Augmented Generation (RAG) with LangChain

Afficher le cours

Instructions

  • Instantiate the 'text-embedding-3-small' embedding model from OpenAI.
  • Create a semantic text splitter that uses vector gradients to determine semantic similarity and uses 0.8 as the threshold at which to split.
  • Split the document using the semantic splitter.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Instantiate an OpenAI embeddings model
embedding_model = ____(api_key="", model='____')

# Create the semantic text splitter with desired parameters
semantic_splitter = ____(
    embeddings=____, breakpoint_threshold_type="____", breakpoint_threshold_amount=____
)

# Split the document
chunks = ____
print(chunks[0])
Modifier et exécuter le code