Splitting by character

A key process in implementing Retrieval Augmented Generation (RAG) is splitting documents into chunks for storage in a vector database.

There are several splitting strategies available in LangChain, some with more complex routines than others. In this exercise, you'll implement a character text splitter, which splits documents based on characters and measures the chunk length by the number of characters.

Remember that there is no ideal splitting strategy, you may need to experiment with a few to find the right one for your use case.

Questo esercizio fa parte del corso

Developing LLM Applications with LangChain

Visualizza il corso

Istruzioni dell'esercizio

Import the CharacterTextSplitter class from langchain_text_splitters.
Create a CharacterTextSplitter instance with separator="\n", chunk_size=24, and chunk_overlap=10.
Use the .split_text() method to split the quote and print the chunks and chunk lengths.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Import the character splitter
from langchain_text_splitters import ____

quote = 'Words are flowing out like endless rain into a paper cup,\nthey slither while they pass,\nthey slip away across the universe.'
chunk_size = 24
chunk_overlap = 10

# Create an instance of the splitter class
splitter = CharacterTextSplitter(
    separator=____,
    chunk_size=____,
    chunk_overlap=____)

# Split the string and print the chunks
docs = splitter.____(quote)
print(docs)
print([len(doc) for doc in docs])

Modifica ed esegui il codice

Questo esercizio fa parte del corso

Developing LLM Applications with LangChain

IntermediárioNível de habilidade

4.8+

Inizia il corso gratis

Welcome to the LangChain framework for building applications on LLMs! You'll learn about the main components of LangChain, including models, chains, agents, prompts, and parsers. You'll create chatbots using both open-source models from Hugging Face and proprietary models from OpenAI, create prompt templates, and integrate different chatbot memory strategies to manage context and resources during conversations.

Exercise 1: The LangChain ecosystem Exercise 2: OpenAI models in LangChain!Exercise 3: Hugging Face models in LangChain!Exercise 4: Prompt templates Exercise 5: Prompt templates and chaining Exercise 6: Chat prompt templates Exercise 7: Few-shot prompting Exercise 8: Creating the few-shot example set Exercise 9: Building the few-shot prompt template Exercise 10: Implementing few-shot prompting

Time to level up your LangChain chains! You'll learn to use the LangChain Expression Language (LCEL) for defining chains with greater flexibility. You'll create sequential chains, where inputs are passed between components to create more advanced applications. You'll also begin to integrate agents, which use LLMs for decision-making.

Exercise 1: Sequential chains Exercise 2: Building prompts for sequential chains Exercise 3: Sequential chains with LCEL Exercise 4: Introduction to LangChain agents Exercise 5: What's an agent?Exercise 6: ReAct agents Exercise 7: Custom tools for agents Exercise 8: Defining a function for tool use Exercise 9: Creating custom tools Exercise 10: Integrating custom tools with agents

One limitation of LLMs is that they have a knowledge cut-off due to being trained on data up to a certain point. In this chapter, you'll learn to create applications that use Retrieval Augmented Generation (RAG) to integrate external data with LLMs. The RAG workflow contains a few different processes, including splitting data, creating and storing the embeddings using a vector database, and retrieving the most relevant information for use in the application. You'll learn to master the entire workflow!

Exercise 1: Integrating document loaders Exercise 2: PDF document loaders Exercise 3: CSV document loaders Exercise 4: HTML document loaders Exercise 5: Splitting external data for retrieval Exercise 6: Splitting by character

Esercizio in corso

Exercise 7: Recursively splitting by character Exercise 8: Splitting HTML Exercise 9: RAG storage and retrieval using vector databases Exercise 10: Preparing the documents and vector database Exercise 11: Building a retrieval prompt template Exercise 12: Creating a RAG chain Exercise 13: Wrap-up!