Get startedGet started for free

Recursively splitting documents

Splitting on a single character is simple and predictable, but it often produces sub-optimal chunks. In this exercise, you'll apply recursive character splitting to split the Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks paper you loaded in a earlier exercise.

Recall that recursive character splitting iterates over a list of characters, splitting on each in turn to see if chunks can be created beneath the chunk_size limit.

This exercise is part of the course

Retrieval Augmented Generation (RAG) with LangChain

View Course

Exercise instructions

  • Define a LangChain recursive character text splitter to split recursively through the character list ['\n', '.', ' ', ''] with a chunk size of 75 and chunk overlap of 10.
  • Split document using the text_splitter you defined and an appropriate method.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

loader = PyPDFLoader("rag_paper.pdf")
document = loader.load()

# Define a text splitter that splits recursively through the character list
text_splitter = ____(
    ____,
    chunk_size=75,  
    chunk_overlap=10  
)

# Split the document using text_splitter
chunks = text_splitter.____
print(chunks)
print([len(chunk.page_content) for chunk in chunks])
Edit and Run Code