Get startedGet started for free

Splitting the play into Acts

Converting unstructured text into hierarchical lexical graphs is an iterative process that involves building splitters for each lexical entity and then splitting by each in turn.

In this exercise, you'll design a splitter to split the play, Romeo and Juliet, into acts. Here is a preview of the structure of the play:

The Project Gutenberg eBook of Romeo and Juliet
This ebook is for the use of anyone anywhere in the United States...

**PROLOGUE:**

 Enter Chorus.

CHORUS.
Two households, both alike in dignity...

ACT I

SCENE I. A public place.
 Enter Sampson and Gregory armed with swords and bucklers.

SAMPSON.
Gregory, on my word, we’ll not carry coals...
...

This exercise is part of the course

Graph RAG with LangChain and Neo4j

View Course

Exercise instructions

  • Update the splitters argument to also split the text on the pattern \n\nACT.
  • Configure the act_splitter to treat the separators list as regular expressions.
  • Split romeo_and_juliet using act_splitter.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

act_splitter = RecursiveCharacterTextSplitter(
  separators=[ 
    r"\n\nTHE PROLOGUE.",
    r"\n\n\*\*\* END",
    # Split by the word ACT
    r"____"
  ],
  # Configure the patterns as regular expressions
  ____=True
)

# Split the play using act_splitter
acts = act_splitter.____(____)

for act in acts:
  print(act.strip().split("\n")[0])
Edit and Run Code