In graph building using Neo4j I am having an issue when importing from CSV import a sizable amount of nodes and relationships. This is a recreation of what is happening: The example has 4 ClinicalTrial
that share Treatment
and TreatmentType
nodes and IN_CLASS
relationships. When these are created reading from a CSV, the relationships endup duplicated. Is there a way of using APOC schemas or any other tool to prevent this from happening ? I know I can remediate it with cypher, but I want to prevent it from happening.
This cypher code re-creates what occurs when reading from CSV (NOTE: the cypher code below is just to recreate what occurs sequentially when reading from CSV):
MERGE (c1:ClinicalTrial {name: "CT1"})
MERGE (t1:Treatment {name: "Caffeine", tClass: "Drug"})
MERGE (tc1:TreatmentClass {name: "Drug"})
MERGE (c1)-[:INTERVENTION]-(t1)-[:IN_CLASS]-(tc1)
MERGE (c2:ClinicalTrial {name: "CT2"})
MERGE (t2:Treatment {name: "Placebo", tClass: "Drug"})
MERGE (tc2:TreatmentClass {name: "Drug"})
MERGE (c2)-[:INTERVENTION]-(t2)-[:IN_CLASS]-(tc2)
MERGE (c3:ClinicalTrial {name: "CT3"})
MERGE (t3:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc3:TreatmentClass {name: "Supplement"})
MERGE (c3)-[:INTERVENTION]-(t3)-[:IN_CLASS]-(tc3)
MERGE (c4:ClinicalTrial {name: "CT4"})
MERGE (t4:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc4:TreatmentClass {name: "Supplement"})
MERGE (c4)-[:INTERVENTION]-(t4)-[:IN_CLASS]-(tc4)
I also tried this APOC schema, but it does not work. A similar schema enforcement without the tClass restrain used to work. It gives me the following error:
ERROR Neo.ClientError.Schema.ConstraintValidationFailed Node(23) already exists with label
Treatment
and propertytClass
= 'Drug'
Here is the nonworking schema:
// Create Schemas Clinical Trials
CALL apoc.schema.assert(
null,
{ClinicalTrial: ['name'],
Treatment: ['name', 'tClass'],
TreatmentClass: ['name']}
,true
)
CodePudding user response:
The problem in the query lies in these parts MERGE (c1)-[:INTERVENTION]-(t1)-[:IN_CLASS]-(tc1)
, when merging paths neo4j checks whether the whole path is non-existent or not, and then creates it. So when, this statement gets executed MERGE (c4)-[:INTERVENTION]-(t4)-[:IN_CLASS]-(tc4)
, since the whole path itself does not exists, it creates it, even though (t4)-[:IN_CLASS]-(tc4)
is present, that's why you see two relations, between same Treatment
and TreatmentClass
nodes. To fix this, you can simply breakdown the MERGE
statements like this:
MERGE (c1:ClinicalTrial {name: "CT1"})
MERGE (t1:Treatment {name: "Caffeine", tClass: "Drug"})
MERGE (tc1:TreatmentClass {name: "Drug"})
MERGE (c1)-[:INTERVENTION]-(t1)
MERGE (t1)-[:IN_CLASS]-(tc1)
MERGE (c2:ClinicalTrial {name: "CT2"})
MERGE (t2:Treatment {name: "Placebo", tClass: "Drug"})
MERGE (tc2:TreatmentClass {name: "Drug"})
MERGE (c2)-[:INTERVENTION]-(t2)
MERGE (t2)-[:IN_CLASS]-(tc2)
MERGE (c3:ClinicalTrial {name: "CT3"})
MERGE (t3:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc3:TreatmentClass {name: "Supplement"})
MERGE (c3)-[:INTERVENTION]-(t3)
MERGE (t3)-[:IN_CLASS]-(tc3)
MERGE (c4:ClinicalTrial {name: "CT4"})
MERGE (t4:Treatment {name: "Placebo", tClass: "Supplement"})
MERGE (tc4:TreatmentClass {name: "Supplement"})
MERGE (c4)-[:INTERVENTION]-(t4)
MERGE (t4)-[:IN_CLASS]-(tc4)