Home > Mobile >  Finding the amount of steps between values in SQL
Finding the amount of steps between values in SQL

Time:10-13

Let's say I have a database table with the following content:

person1_id person2_id
1 2
3 4
2 3

(this table continues for about 60k rows with pairs of people, the pairs can occur multiple times and aren't set in stone, so person 1 can also be paired with person 3)

I need to know the amount of steps a person is away from another, so for example how many steps away is 1 from 4 it would take 3 steps (1->2->3->4).

As an output I'd expect something alike:

person1_id person2_id steps
1 4 3

From what I found it should be possible using recursive queries, but I have no idea how to approach the problem

CodePudding user response:

In a recursive query, you can identify two steps:

  • Base Step: contains existing rows, according to which we want to recur on.
  • Recursive Step: contains new rows, generated from the Base Step at timestep 0, and from the both Base and Recursive Step at timesteps 1 .

"Base" Step

First we want to select the relevant rows to build our Base Step. For this one, I'm assuming you don't want rows like "(2, 4, 2)" and "(3, 4, 1)", so basically you always want "person1_id" to not appear among "person2_id" values.

SELECT person1_id FROM tab 
EXCEPT
SELECT person2_id FROM tab
person1_id
1

This will give us just the "person1_id", hence in order to get back the full rows which have these ids, you can do a JOIN operation as follows:

SELECT tab.person1_id, tab.person2_id, 1 AS steps
FROM       tab 
INNER JOIN (SELECT person1_id FROM tab 
            EXCEPT
            SELECT person2_id FROM tab) p1_never_p2
        ON tab.person1_id = p1_never_p2.person1_id
person1_id person2_id steps
1 4 1

Note that "steps" field will be always initialized to 1 here, as long as we are in Base Step (the distance between any row belonging to the original table between "person1_id" and "person2_id" is always 1).


"Recursive" Step

In this step we want to match the current table's "person2_id" with the original table's "person1_id", while keeping current table's "person1_id" and original table's "person2_id" (if I have 1-2 in the current and 2-3 in the original, I want to keep 1-3 because both rows match on the 2). The current output will be kept by the recursive query, and that's what we find in the JOIN with the original table.

WITH RECURSIVE cte AS (
    SELECT tab.person1_id, tab.person2_id, 1 AS steps
    FROM       tab 
    INNER JOIN (SELECT person1_id FROM tab 
                EXCEPT
                SELECT person2_id FROM tab) p1_never_p2
            ON tab.person1_id = p1_never_p2.person1_id

    UNION ALL

    SELECT cte.person1_id, tab.person2_id, cte.steps   1 AS steps
    FROM       cte
    INNER JOIN tab
            ON cte.person2_id = tab.person1_id
)

Note that this subquery is called "cte", though it is called inside, within the FROM clause, to apply the INNER JOIN. Also note that we increment the steps value that we keep updating inside "cte".

person1_id person2_id steps
1 2 1
1 3 2
1 4 3

Basically what you see here is the base step (row 1) and every recursive step (rows 2 and 3). We have all rows from both steps because of the UNION ALL operation.


Since we want only the combinations with the highest amount of step for each "person1_id" value, we can use the FETCH FIRST ROWS WITH TIES constraint as follows:

WITH RECURSIVE cte AS (
    SELECT tab.person1_id, tab.person2_id, 1 AS steps
    FROM       tab 
    INNER JOIN (SELECT person1_id FROM tab 
                EXCEPT
                SELECT person2_id FROM tab) p1_never_p2
            ON tab.person1_id = p1_never_p2.person1_id
    UNION ALL
    SELECT cte.person1_id, tab.person2_id, cte.steps   1 AS steps
    FROM       cte
    INNER JOIN tab
            ON cte.person2_id = tab.person1_id
)
SELECT * 
FROM cte
ORDER BY ROW_NUMBER() OVER(PARTITION BY person1_id ORDER BY steps DESC)
FETCH FIRST ROWS WITH TIES

The ROW_NUMBER window function will generate the value 1 for each "person1_id" biggest amount of steps, then the FETCH FIRST ROWS WITH TIES will keep the rows whose ROW_NUMBER is equal to 1, tied.

person1_id person2_id steps
1 4 3

Check the demo here.

  • Related