Combining lexical graphs with vector search

1. Combining lexical graphs with vector search

While vector search will identify semantically similar concepts in the text, node relationships reveal additional insights about the information it contains.

2. Using relationships in semantic search

Say we use vector search to find reviews of what it's like to work at Neo4j. The embedded reviews, stored under an embedding node property, can be used for vector search, which will retrieve similar text to the question.

3. Using relationships in semantic search

The relationships reveal that this review was posted by a user

4. Using relationships in semantic search

who just so happens to be the CEO of Neo4j. Having access to this additional relationship information will result in more reliable and trustworthy responses.

5. Text properties to embeddings

We can use the text property of each chunk stored in the graph

6. Text properties to embeddings

to create embeddings that capture the semantic meaning of the text. We can then take advantage of Neo4j's vector indexes to perform semantic search and return relationships alongside the node properties for additional context.

7. Neo4j as a vector store

The Neo4jVector class allows us to work with vector indexes in Neo4j. The .from_documents() method creates nodes and an underlying index from a list of Documents, and the .from_existing_graph() method will create an index from nodes that already exist in a knowledge graph.

8. Chunking existing node properties

Previously, we created the hierarchy of Acts and Scenes in a hierarchical lexical graph. As the text property on the scene node is too large to create meaningful embeddings, we need to split it into smaller chunks. We can iteratively build out the lexical graph by querying the database to find any Scene nodes without a HAS_LINE relationship.

9. Chunking existing node properties

The text property can be split using the pattern of two new lines to create chunks that consist of individual stage directions or character lines. Chunks corresponding to a stage direction contain one line, whereas spoken lines start with the character's name, followed by the words that they speak on a new line.

10. Chunking existing node properties

This can be done with a text splitter, but we'll use string methods to perform the split manually. We split the text into lines using the "\n\n" pattern and the .split() method. Entries with multiple lines are lines spoken by a character. Then, for each spoken line, split it into two parts: the character name and spoken words, and assign these to the character and text variables.

11. Chunking existing node properties

From there, we create nodes for the line, the character, and then create the HAS_LINE relationship between the scene and line nodes, and the SPOKEN_BY relationship from the line to the character.

12. Creating a vector index on an existing graph

With these nodes in the graph, the .from_existing_graph() method can be used to create an index, or use an index with this configuration if it already exists. The method expects an embedding model and database credentials. The node_label argument specifies the label to create the index on. text_node_properties defines a list of properties to combine to create the embedding, which is just the text here. The embedding_node_property is the property name to save the embedding under, and finally the index_name property defines the name of the index.

13. Vector retrieval with LCEL

From our Neo4j vector store, we create a LangChain retriever that can be invoked in a chain to find similar documents.

14. Let's practice!

Later in the chapter, we'll create a retrieval chain for this hybrid approach, but for now, let's combine graphs and embeddings!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.