Let's say I have a database table with the following content:
person1_id | person2_id |
---|---|
1 | 2 |
3 | 4 |
2 | 3 |
(this table continues for about 60k rows with pairs of people, the pairs can occur multiple times and aren't set in stone, so person 1 can also be paired with person 3)
I need to know the amount of steps a person is away from another, so for example how many steps away is 1 from 4 it would take 3 steps (1->2->3->4).
As an output I'd expect something alike:
person1_id | person2_id | steps |
---|---|---|
1 | 4 | 3 |
From what I found it should be possible using recursive queries, but I have no idea how to approach the problem
CodePudding user response:
In a recursive query, you can identify two steps:
- Base Step: contains existing rows, according to which we want to recur on.
- Recursive Step: contains new rows, generated from the Base Step at timestep 0, and from the both Base and Recursive Step at timesteps 1 .
"Base" Step
First we want to select the relevant rows to build our Base Step. For this one, I'm assuming you don't want rows like "(2, 4, 2)" and "(3, 4, 1)", so basically you always want "person1_id" to not appear among "person2_id" values.
SELECT person1_id FROM tab
EXCEPT
SELECT person2_id FROM tab
person1_id |
---|
1 |
This will give us just the "person1_id", hence in order to get back the full rows which have these ids, you can do a JOIN
operation as follows:
SELECT tab.person1_id, tab.person2_id, 1 AS steps
FROM tab
INNER JOIN (SELECT person1_id FROM tab
EXCEPT
SELECT person2_id FROM tab) p1_never_p2
ON tab.person1_id = p1_never_p2.person1_id
person1_id | person2_id | steps |
---|---|---|
1 | 4 | 1 |
Note that "steps" field will be always initialized to 1 here, as long as we are in Base Step (the distance between any row belonging to the original table between "person1_id" and "person2_id" is always 1).
"Recursive" Step
In this step we want to match the current table's "person2_id" with the original table's "person1_id", while keeping current table's "person1_id" and original table's "person2_id" (if I have 1-2 in the current and 2-3 in the original, I want to keep 1-3 because both rows match on the 2). The current output will be kept by the recursive query, and that's what we find in the JOIN
with the original table.
WITH RECURSIVE cte AS (
SELECT tab.person1_id, tab.person2_id, 1 AS steps
FROM tab
INNER JOIN (SELECT person1_id FROM tab
EXCEPT
SELECT person2_id FROM tab) p1_never_p2
ON tab.person1_id = p1_never_p2.person1_id
UNION ALL
SELECT cte.person1_id, tab.person2_id, cte.steps 1 AS steps
FROM cte
INNER JOIN tab
ON cte.person2_id = tab.person1_id
)
Note that this subquery is called "cte", though it is called inside, within the FROM clause, to apply the INNER JOIN
. Also note that we increment the steps value that we keep updating inside "cte".
person1_id | person2_id | steps |
---|---|---|
1 | 2 | 1 |
1 | 3 | 2 |
1 | 4 | 3 |
Basically what you see here is the base step (row 1) and every recursive step (rows 2 and 3). We have all rows from both steps because of the UNION ALL
operation.
Since we want only the combinations with the highest amount of step for each "person1_id" value, we can use the FETCH FIRST ROWS WITH TIES
constraint as follows:
WITH RECURSIVE cte AS (
SELECT tab.person1_id, tab.person2_id, 1 AS steps
FROM tab
INNER JOIN (SELECT person1_id FROM tab
EXCEPT
SELECT person2_id FROM tab) p1_never_p2
ON tab.person1_id = p1_never_p2.person1_id
UNION ALL
SELECT cte.person1_id, tab.person2_id, cte.steps 1 AS steps
FROM cte
INNER JOIN tab
ON cte.person2_id = tab.person1_id
)
SELECT *
FROM cte
ORDER BY ROW_NUMBER() OVER(PARTITION BY person1_id ORDER BY steps DESC)
FETCH FIRST ROWS WITH TIES
The ROW_NUMBER
window function will generate the value 1 for each "person1_id" biggest amount of steps, then the FETCH FIRST ROWS WITH TIES
will keep the rows whose ROW_NUMBER
is equal to 1, tied.
person1_id | person2_id | steps |
---|---|---|
1 | 4 | 3 |
Check the demo here.