BaşlayınÜcretsiz Başlayın

Splitting HTML

In this exercise, you'll split an HTML containing an executive order on AI created by the US White House in October 2023. To retain as much context as possible in the chunks, you'll split using larger chunk_size and chunk_overlap values.

All of the LangChain classes necessary for completing this exercise have been pre-loaded for you.

Bu egzersiz

Developing LLM Applications with LangChain

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Create an UnstructuredHTMLLoader for white_house_executive_order_nov_2023.html, and load it into memory.
  • Set a chunk_size of 300 and a chunk_overlap of 100.
  • Create a RecursiveCharacterTextSplitter splitting on the '.' character, and use the .split_documents() method to split data and print the chunks.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Load the HTML document into memory
loader = UnstructuredHTMLLoader(____)
data = loader.____()

# Define variables
chunk_size = ____
chunk_overlap = ____

# Split the HTML
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separators=____)

docs = splitter.____(data)
print(docs)
Kodu Düzenle ve Çalıştır