Home > Software design >  Cypher: slow query optimization
Cypher: slow query optimization

Time:10-25

I am using redisgraph with a custom implementation of ioredis. The query runs 3 to 6 seconds on a database that has millions of nodes. It basically filters (b:brand) by different relationship counts by adding the following match and where multiple times on different nodes.

(:brand) - 1mil nodes
(:w) - 20mil nodes
(:e) - 10mil nodes
// matching b before this codeblock
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE  count >= 0   AND count <= 10

The full query would look like this.

MATCH (b:brand)
WHERE  b.deleted IS NULL
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE  count >= 0   AND count <= 10
MATCH (c)-[:r3]->(d:d)<-[:r4]-(e:e)
WHERE e.deleted IS NULL
WITH count(DISTINCT e) as count, b
WHERE  count >= 0   AND count <= 10
WITH b ORDER by b.name asc
WITH count(b) as totalCount, collect({id: b.id)[$cursor..($cursor $limit)] AS brands
RETURN brands, totalCount

How can I optimize this query as it's really slow?

CodePudding user response:

A few thoughts:

  • Property lookups are expensive; is there a way you can get around all the .deleted checks?
  • If possible, can you avoid naming r1, r2, etc.? It's faster when it doesn't have to check the relationship type.
  • You're essentially traversing the entire graph several times. If the paths b-->p<--w and c-->d<--e don't overlap, you can include them both in the match statement, separated by a comma, and aggregate both counts at once
  • I don't know if it'll help much, but you don't need to name p and d since you never refer to them
  • This is a very small improvement, but I don't see a reason to check count >= 0

Also, I'm sure you have your reasons, but why does the c-->d<--e path matter? This would make more sense to me if it were b-->d<--e to mirror the first portion.

  • Related