Get startedGet started for free

Splitting by character

A key process in implementing Retrieval Augmented Generation (RAG) is splitting documents into chunks for storage in a vector database.

There are several splitting strategies available in LangChain, some with more complex routines than others. In this exercise, you'll implement a character text splitter, which splits documents based on characters and measures the chunk length by the number of characters.

Remember that there is no ideal splitting strategy, you may need to experiment with a few to find the right one for your use case.

This exercise is part of the course

Developing LLM Applications with LangChain

View Course

Exercise instructions

  • Import the appropriate LangChain class for splitting a document by character.
  • Define a character splitter that splits on "\n" with a chunk_size of 24 and chunk_overlap of 10.
  • Split quote, and print the chunks and chunk lengths.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the character splitter
from langchain_text_splitters import ____

quote = 'Words are flowing out like endless rain into a paper cup,\nthey slither while they pass,\nthey slip away across the universe.'
chunk_size = 24
chunk_overlap = 10

# Create an instance of the splitter class
splitter = ____

# Split the string and print the chunks
docs = ____
print(docs)
print([len(doc) for doc in docs])
Edit and Run Code