Say i have the following table, called graph:
_id |
relates |
---|---|
1 |
{2, 3} |
2 |
{4} |
3 |
{5, 6} |
4 |
{3, 7} |
5 |
{} |
6 |
{} |
7 |
{} |
Here it is in graph form: graph
My problem is to write a recursive query to find the shortest path between two nodes. In this example i will aim to find the shortest path between nodes 1 and 3.
The following query works fine and solves the problem.
with recursive pathFrom1to3(_id, relates, lvl, _path) as
(
select _id, relates, 1 as lvl, ARRAY[_id]
from graph
where _id = 1
union
select g._id, g.relates, p.lvl 1 as lvl, _path || g._id
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles (not applicable for this example)
)
select * from pathFrom21
where _id = 3
limit 1
The recursive query actually finds all possible paths from starting from node 1, until it cannot find any more paths, so after the recursion is over, we are left with this table:
_id |
relates |
lvl |
_path |
---|---|---|---|
1 |
{2, 3} |
1 |
{1} |
2 |
{4} |
2 |
{1, 2} |
3 |
{5, 6} |
2 |
{1, 3} |
4 |
{3, 7} |
3 |
{1, 2, 4} |
5 |
{} |
3 |
{1, 3, 5} |
6 |
{} |
3 |
{1, 3, 6} |
3 |
{5, 6} |
4 |
{1, 2, 4, 3} |
7 |
{} |
4 |
{1, 2, 4, 7} |
5 |
{} |
5 |
{1, 2, 4, 3, 5} |
6 |
{} |
5 |
{1, 2, 4, 3, 6} |
After we filter out for _id = 3
we get:
_id |
relates |
lvl |
_path |
---|---|---|---|
3 |
{5, 6} |
2 |
{1, 3} |
3 |
{5, 6} |
4 |
{1, 2, 4, 3} |
And it is always the case that the shortest path is the path that we hit first (since the edges have no weight). With the logic of our query that would be equivalent to the earliest returned record, so we can use LIMIT for this: limit 1
_id |
relates |
lvl |
_path |
---|---|---|---|
3 |
{5, 6} |
2 |
{1, 2, 4} |
And there is the shortest path from node 1 to 3.
My issue is the fact that this query computes all paths from node 1, making it awfully inefficient if the target node is near the start of the search. My goal is to make the query stop searching (completely) right after it hits the target node, since we know there will not be any other shortest path to come.
I need a way to terminate the recursion completely when node 3 is reached.
If i try to add a terminator for the node itself, such as the query below, the recursion for the other nodes at the same level continues, since the terminating condition is still satisfied:
with recursive pathFrom1to3(_id, relates, lvl, _path) as
(
select _id, relates, 1 as lvl, ARRAY[_id]
from graph
where _id = 1
union
select g._id, g.relates, p.lvl 1 as lvl, _path || g._id
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles (not applicable for this example)
**and not p._id = 3**
)
select * from pathFrom21
produces:
_id | relates | lvl | _path |
---|---|---|---|
1 |
{2, 3} |
1 |
{1} |
2 |
{4} |
2 |
{1, 2} |
3 |
{5, 6} |
2 |
{1, 3} |
4 |
{3, 7} |
3 |
{1, 2, 4} |
3 |
{5, 6} |
4 |
{1, 2, 4, 3} |
7 |
{} |
4 |
{1, 2, 4, 7} |
5 |
{} |
5 |
{1, 2, 4, 3, 5} |
6 |
{} |
5 |
{1, 2, 4, 3, 6} |
Notice that the recursion stops for when node 3 is reached the first time; nodes 5 and 6 doesn't get searched further because node 3 was hit. But the recursion continues for node 4 (from node 2) because the p._id is not 3, it is 2.
We need a way to terminate the recursion when it reaches node 3 for the entire level.
My idea is to create a reached column which is 0 when the _id is not 3 and 1 when the _id is 3. Then we can use SUM to check the sum of the reached values for the entire level and if it is not 0 then we terminate, but i am struggling to write this as a query.
here is my attempt at writing it:
with recursive pathFrom1to3(_id, relates, lvl, _path, reached, lvlReached) as
(
select _id, relates, 1 as lvl, ARRAY[_id], 0 as reached, 0 as lvlReached
from graph
where _id = 1
union
select _id, relates, lvl, _path, reached, lvlReached from
(
select g._id, g.relates, p.lvl 1 as lvl, _path || g._id,
case when 3 = any(g.relates) then 1 else 0 end as reached
from pathFrom1to3 p join graph g on g._id = any(p.relates)
where not g._id = any(p._path) -- handles cycles
) mainRecursion
join
(
select lvl, sum(reached) as lvlReached
from pathFrom21
group by lvl
) lvlReached
on mainRecursion.lvl = lvlReach.lvl
where lvlReached = 0
)
select * from pathFrom21
This gives me the error:
recursive reference to query "pathFrom1to3" must not appear more than once.
CodePudding user response:
You can search from the bottom up, working backwards from 3
to 1
, thus decreasing the search space:
with recursive cte(id, vals) as (
select g._id, array[3, g._id] from graph g where 3 = any(g.relates)
union all
select g._id, vals||g._id from cte c join graph g
on c.id = any(g.relates) and not g._id = any(c.vals)
)
select * from cte where id = 1
CodePudding user response:
We can try a window function to hold the status indicating the solution is found, causing all further recursive iterations to be pruned.
WITH RECURSIVE cte(id, vals, relates, done) AS (
SELECT g._id, array[g._id], relates, null::int
FROM graph AS g
WHERE _id = 1
UNION ALL
SELECT g._id, vals||g._id, g.relates
, MAX(CASE WHEN 3 IN (done, g._id) THEN 3 END) OVER ()
FROM cte AS c
JOIN graph AS g
ON g._id = any(c.relates)
AND NOT g._id = any(c.vals) -- Avoid cycles
AND done IS NULL -- wait until we're done
)
SELECT * FROM cte
WHERE id = 3
;
The result:
id | vals | relates | done |
---|---|---|---|
3 | {1,3} | {5,6} | 3 |
By commenting out the id = 3
logic in the last query expression, we see all the generated rows:
id | vals | relates | done |
---|---|---|---|
1 | {1} | {2,3} | null |
2 | {1,2} | {4} | 3 |
3 | {1,3} | {5,6} | 3 |