Optimize Neo4j cypher query on huge dataset-CodePudding

The following query can't run on a dataset with ~2M nodes. What should i do to make it run faster?

MATCH (cc:ConComp)-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person)
            WHERE cc.cluster_type = "household"
            MERGE (cluster:Cluster {CLUSTER_TMP_ID:cc.CONCOMP_ID   '|'   r2.root_id, cluster_type:cc.cluster_type })
            MERGE (cluster)-[r3:IN_CLUSTER]-(p1)

CodePudding user response：

A number of suggestions:

adding directions to your relationships will decrease the number of paths in the MATCH
make sure that you have indexes on all properties that you MERGE on
in the second MERGE , also add direction.

CodePudding user response：

I finally found a solution by using the following query (and by indexing cc.cluster_type and cc.CONCOMP_ID):

CALL apoc.periodic.iterate('MATCH (cc:ConComp)<-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person) WHERE cc.cluster_type = "household" WITH DISTINCT cc.CONCOMP_ID   "|"   r2.root_id as id_name, cc.cluster_type as cluster_type_name, p1 RETURN id_name, cluster_type_name, p1', '
            MERGE (cluster:Cluster {CLUSTER_TMP_ID: id_name, cluster_type: cluster_type_name}) 
                 
            MERGE (cluster)-[r3:IN_CLUSTER]->(p1)', {batchSize:10000, parallel:false})

I precise that I had previously ran my initial question query with apoc.periodic.iterate without success.