MulaiMulai sekarang secara gratis

Splitting the play into Acts

Converting unstructured text into hierarchical lexical graphs is an iterative process that involves building splitters for each lexical entity and then splitting by each in turn.

In this exercise, you'll design a splitter to split the play, Romeo and Juliet, into acts. Here is a preview of the structure of the play:

The Project Gutenberg eBook of Romeo and Juliet
This ebook is for the use of anyone anywhere in the United States...

**PROLOGUE:**

 Enter Chorus.

CHORUS.
Two households, both alike in dignity...

ACT I

SCENE I. A public place.
 Enter Sampson and Gregory armed with swords and bucklers.

SAMPSON.
Gregory, on my word, we’ll not carry coals...
...

Latihan ini adalah bagian dari kursus

Graph RAG with LangChain and Neo4j

Lihat Kursus

Petunjuk latihan

  • Update the splitters argument to also split the text on the pattern \n\nACT.
  • Configure the act_splitter to treat the separators list as regular expressions.
  • Split romeo_and_juliet using act_splitter.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

act_splitter = RecursiveCharacterTextSplitter(
  separators=[ 
    r"\n\nTHE PROLOGUE.",
    r"\n\n\*\*\* END",
    # Split by the word ACT
    r"____"
  ],
  # Configure the patterns as regular expressions
  ____=True
)

# Split the play using act_splitter
acts = act_splitter.____(____)

for act in acts:
  print(act.strip().split("\n")[0])
Edit dan Jalankan Kode