Home > other >  Optimize Neo4j cypher query on huge dataset
Optimize Neo4j cypher query on huge dataset

Time:12-21

The following query can't run on a dataset with ~2M nodes. What should i do to make it run faster?

MATCH (cc:ConComp)-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person)
            WHERE cc.cluster_type = "household"
            MERGE (cluster:Cluster {CLUSTER_TMP_ID:cc.CONCOMP_ID   '|'   r2.root_id, cluster_type:cc.cluster_type })
            MERGE (cluster)-[r3:IN_CLUSTER]-(p1)

CodePudding user response:

A number of suggestions:

  • adding directions to your relationships will decrease the number of paths in the MATCH
  • make sure that you have indexes on all properties that you MERGE on
  • in the second MERGE , also add direction.

CodePudding user response:

I finally found a solution by using the following query (and by indexing cc.cluster_type and cc.CONCOMP_ID):

CALL apoc.periodic.iterate('MATCH (cc:ConComp)<-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person) WHERE cc.cluster_type = "household" WITH DISTINCT cc.CONCOMP_ID   "|"   r2.root_id as id_name, cc.cluster_type as cluster_type_name, p1 RETURN id_name, cluster_type_name, p1', '
            MERGE (cluster:Cluster {CLUSTER_TMP_ID: id_name, cluster_type: cluster_type_name}) 
                 
            MERGE (cluster)-[r3:IN_CLUSTER]->(p1)', {batchSize:10000, parallel:false})

I precise that I had previously ran my initial question query with apoc.periodic.iterate without success.

  • Related