Splitting Python files
Although text and code files contain the same characters, code files contain structures beyond natural language. To retain this code-specific context during document splitting, you should program the splitter to first try to split on the most common code structure. Fortunately, LangChain provides functionality to do just that!
All of the necessary classes have been imported for you, including Language
from langchain_text_splitters
.
Este ejercicio forma parte del curso
Retrieval Augmented Generation (RAG) with LangChain
Instrucciones del ejercicio
- Create a recursive character splitter that will split on common Python code structures.
- Split the
python_data
document loader into chunks.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Create a Python-aware recursive character splitter
python_splitter = RecursiveCharacterTextSplitter.____(
____, chunk_size=300, chunk_overlap=100
)
# Split the Python content into chunks
chunks = ____
for i, chunk in enumerate(chunks[:3]):
print(f"Chunk {i+1}:\n{chunk.page_content}\n")