Home > Software engineering >  Neo4J Enforce Unique Relationships bertween Nodes when reading from CSV
Neo4J Enforce Unique Relationships bertween Nodes when reading from CSV

Time:07-03

In graph building using Neo4j I am having an issue when importing from CSV import a sizable amount of nodes and relationships. This is a recreation of what is happening: The example has 4 ClinicalTrial that share Treatment and TreatmentType nodes and IN_CLASS relationships. When these are created reading from a CSV, the relationships endup duplicated. Is there a way of using APOC schemas or any other tool to prevent this from happening ? I know I can remediate it with cypher, but I want to prevent it from happening.

This cypher code re-creates what occurs when reading from CSV (NOTE: the cypher code below is just to recreate what occurs sequentially when reading from CSV):

MERGE (c1:ClinicalTrial {name: "CT1"})
MERGE (t1:Treatment {name: "Caffeine", tClass: "Drug"})
MERGE (tc1:TreatmentClass {name: "Drug"})
MERGE (c1)-[:INTERVENTION]-(t1)-[:IN_CLASS]-(tc1)
    
MERGE (c2:ClinicalTrial {name: "CT2"})
MERGE (t2:Treatment {name: "Placebo", tClass: "Drug"})
MERGE (tc2:TreatmentClass {name: "Drug"})
MERGE (c2)-[:INTERVENTION]-(t2)-[:IN_CLASS]-(tc2)
    
MERGE (c3:ClinicalTrial {name: "CT3"})
MERGE (t3:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc3:TreatmentClass {name: "Supplement"})
MERGE (c3)-[:INTERVENTION]-(t3)-[:IN_CLASS]-(tc3)

MERGE (c4:ClinicalTrial {name: "CT4"})
MERGE (t4:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc4:TreatmentClass {name: "Supplement"})
MERGE (c4)-[:INTERVENTION]-(t4)-[:IN_CLASS]-(tc4)

I also tried this APOC schema, but it does not work. A similar schema enforcement without the tClass restrain used to work. It gives me the following error:

ERROR Neo.ClientError.Schema.ConstraintValidationFailed Node(23) already exists with label Treatment and property tClass = 'Drug'

Here is the nonworking schema:

// Create Schemas Clinical Trials
CALL apoc.schema.assert(
    null,
    {ClinicalTrial: ['name'],
    Treatment: ['name', 'tClass'],
    TreatmentClass: ['name']}
    ,true
)

This is what I want: Ideal Clinical Trial Node and Relationship Representation

This is what I get: Actual Incorrect Neo4J Representation when importing from CSV

CodePudding user response:

The problem in the query lies in these parts MERGE (c1)-[:INTERVENTION]-(t1)-[:IN_CLASS]-(tc1), when merging paths neo4j checks whether the whole path is non-existent or not, and then creates it. So when, this statement gets executed MERGE (c4)-[:INTERVENTION]-(t4)-[:IN_CLASS]-(tc4), since the whole path itself does not exists, it creates it, even though (t4)-[:IN_CLASS]-(tc4) is present, that's why you see two relations, between same Treatment and TreatmentClass nodes. To fix this, you can simply breakdown the MERGE statements like this:

MERGE (c1:ClinicalTrial {name: "CT1"})
MERGE (t1:Treatment {name: "Caffeine", tClass: "Drug"})
MERGE (tc1:TreatmentClass {name: "Drug"})
MERGE (c1)-[:INTERVENTION]-(t1)
MERGE (t1)-[:IN_CLASS]-(tc1)
    
MERGE (c2:ClinicalTrial {name: "CT2"})
MERGE (t2:Treatment {name: "Placebo", tClass: "Drug"})
MERGE (tc2:TreatmentClass {name: "Drug"})
MERGE (c2)-[:INTERVENTION]-(t2)
MERGE (t2)-[:IN_CLASS]-(tc2)
    
MERGE (c3:ClinicalTrial {name: "CT3"})
MERGE (t3:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc3:TreatmentClass {name: "Supplement"})
MERGE (c3)-[:INTERVENTION]-(t3)
MERGE (t3)-[:IN_CLASS]-(tc3)

MERGE (c4:ClinicalTrial {name: "CT4"})
MERGE (t4:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc4:TreatmentClass {name: "Supplement"})
MERGE (c4)-[:INTERVENTION]-(t4)
MERGE (t4)-[:IN_CLASS]-(tc4)
  • Related