Get startedGet started for free

Splitting semantically

All of the splitting strategies you've used up to this point have the same drawback: the split doesn't consider the context of the surrounding text, so context can easily be lost during splitting.

In this exercise, you'll create and apply a semantic text splitter, which is a cutting-edge experimental method for splitting text based on semantic meaning. When the splitter detects that the meaning of the text has deviated past a certain threshold, a split will be performed.

This exercise is part of the course

Retrieval Augmented Generation (RAG) with LangChain

View Course

Exercise instructions

  • Instantiate the 'text-embedding-3-small' embedding model from OpenAI.
  • Create a semantic text splitter that uses vector gradients to determine semantic similarity and uses 0.8 as the threshold at which to split.
  • Split the document using the semantic splitter.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Instantiate an OpenAI embeddings model
embedding_model = ____(api_key="", model='____')

# Create the semantic text splitter with desired parameters
semantic_splitter = ____(
    embeddings=____, breakpoint_threshold_type="____", breakpoint_threshold_amount=____
)

# Split the document
chunks = ____
print(chunks[0])
Edit and Run Code