Graph-based entity resolution
Oh no! Even with a bullet-proof prompt, few-shot examples and our fingers crossed, the LLM has still hallucinated up a character called Romeo Capulet and has assigned him a few lines in the database.
{
"id": "romeo-capulet",
"name": "Romeo Capulet",
"family": "Montague"
}
Luckily, you can use the relationships in the knowledge graph to decide whether this node is likely an accidental duplicate of another named Romeo Montague
.
The similarity_cypher
Cypher statement uses the MEMBER_OF
and INTERACTS_WITH
relationships to build an arbitrary similarity score with the following properties:
Condition | Cypher | Points Modifier |
---|---|---|
A MEMBER_OF relationship to the same family |
af = bf |
+1 |
The same name property | a.name = b.name |
+2 |
Percentage of characters that a and b interact with |
size(inCommon) / size(aInteractsWith) |
Multiplied by percentage of b that a interacts with |
From your domain knowledge, you know that a score of 2
indicates a strong correlation.
This exercise is part of the course
Graph RAG with LangChain and Neo4j
Exercise instructions
- Query
graph
with the Cypher query (similarity_cypher
) to calculate similarity scores between"romeo-capulet"
and the other character nodes. - Extract the
"bId"
,"bFamily"
, and"score"
, in that order, from eachrow
inresults
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Query the graph with similarity_cypher
result = ____
# Extract and print the results
for row in result:
print(row['____'], 'from', row['____'], 'has similarity score of ', row['____'])