Domain graphs

1. Domain graphs

While lexical graphs model the structure of the text, like chapters and pages, a domain graph is structured around the entities in the text.

2. Domain graphs

Domain graphs model real-world concepts, describing how they're related. Creating a domain graph from unstructured text involves extracting entities and their relationships from the text, and saving them in a knowledge graph.

3. Lexical graphs to domain graphs

It's time for a sing-along! The lyrics to the song "Don't Stop Believin'" by Journey mention a

4. Lexical graphs to domain graphs

small town boy, who we can label as a person

5. Lexical graphs to domain graphs

and the location South Detroit. Properties can be applied to both of these nodes to describe the entities that they represent.

6. Lexical graphs to domain graphs

The lyrics also contain information on the relationships between these entities. The boy was born

7. Lexical graphs to domain graphs

and raised in South Detroit.

8. An example domain graph

In a different example from the world of finance, we can use a domain graph to understand a customer's purchase history. A person holds a credit card, which they use to place orders. Each order is placed at a store which is located at a physical address. Understanding the company that the person works at, and where the office is situated can provide insights into the customers buying habits, which may be influenced by salary or proximity.

9. Structured outputs

The most reliable way to extract entities and relationships from unstructured text is with LLMs and structured outputs. Put simply, we'll define our entity in a structured way with Pydantic classes, then use an LLM to generate these structures. Pydantic classes have properties that describe the entity we want to extract. These classes inherit from Pydantic’s BaseModel class. Here we start a class to extract character entities from Romeo and Juliet. Each property is annotated with the Field annotation, where we can provide a field description and an array of examples. Our extracted Character objects should have an ID, their name in slug case as demonstrated in the list of examples, a name, and an optional family property to represent the family they belong to. If we need to extract one or more entities, we should define another class that acts as a container, with the character property typed as a list. So the CharacterOutput object here will output a list of Character objects.

10. Using structured outputs

We then define an LLM, and enable structured outputs by calling the .with_structured_output() method on the LangChain model class. The method returns a new instance of a model, that, whenever it is used, will append the output instructions and attempt to parse any response into an instance of the structured output, in this case, characters. The output from this chain is a CharacterOutput object, with each of the extracted characters under its characters field. We can now extract these characters from the output and use them to create the domain graph.

11. Saving entities

We can use a list comprehension to convert these Character objects into Nodes. For each character under the characters field of the output, we extract the ID, name, and family properties assigned by LLM. We also give these nodes the “Character” type. Then, we create a GraphDocument from the character nodes and use the .add_graph_documents() method to save them to the graph.

12. Let's practice!

Now it's your turn to extract a structured output from the play.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.