I am having trouble adapting some recursive CTE code from PostgreSQL to SQL Server.
Here is my table:
CREATE TABLE flight (
src CHAR(3)
, dest CHAR(3)
, stt DATETIME
, endt DATETIME);
INSERT INTO flight VALUES
('MSP', 'SLC', '2022-10-02 11:45:00', '2022-10-02 14:10:00'),
('SLC', 'LAX', '2022-10-02 15:20:00', '2022-10-02 17:45:00'),
('MSP', 'LAX', '2022-10-02 12:15:00', '2022-10-02 15:05:00')
and what I am trying to adapt:
WITH RECURSIVE flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
src
, ARRAY[src || '-' || dest]
, ARRAY[src]
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, fp.flights || (f.src || '-' || f.dest)
, fp.path || f.src
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
WHERE NOT f.src = ANY(fp.path)
AND NOT 'LAX' = ANY(fp.path)
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path[2:] stopovers
FROM flight_paths
WHERE src = 'MSP' AND dest = 'LAX'
I have been having issues adapting the use of ARRAYs. Any pointers would be really helpful!
CodePudding user response:
Unfortunately, SQL Server does not support arrays. But you could use a JSON array instead.
A pretty straight-up copy of your original code results in this
WITH flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
src
, CAST('["' src '-' dest '"]' AS nvarchar(max))
, CAST('["' src '"]' AS nvarchar(max))
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, JSON_MODIFY(fp.flights, 'append $', f.src '-' f.dest)
, JSON_MODIFY(fp.path, 'append $', f.src)
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest AND f.dest <> fp.src
WHERE EXISTS (SELECT 1
FROM OPENJSON(fp.path) arr
WHERE arr.value NOT IN(f.src, 'LAX')
)
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path stopovers
FROM flight_paths
WHERE src = 'MSP'
AND dest = 'LAX'
Some minor changes in the logic were necessary:
- It's difficult to delete an element from a JSON array. So instead the
path
only builds the intermediate nodes, and a separate check is made to exclude the start point.
I note that it's probably faster to check explicitly that the destination has not been reached, and also to push the start point into the anchor part of the CTE:
WITH flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
src
, CAST('["' src '-' dest '"]' AS nvarchar(max))
, CAST('["' src '"]' AS nvarchar(max))
, dest
, stt
, endt
FROM flight
WHERE src = 'MSP'
UNION ALL
SELECT
fp.src
, JSON_MODIFY(fp.flights, 'append $', f.src '-' f.dest)
, JSON_MODIFY(fp.path, 'append $', f.src)
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest AND f.dest <> fp.src
AND fp.dest <> 'LAX'
WHERE EXISTS (SELECT 1
FROM OPENJSON(fp.path) arr
WHERE arr.value <> f.src
)
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path stopovers
FROM flight_paths
WHERE dest = 'LAX';
CodePudding user response:
Much more quicker will be :
WITH flight_paths (src, flights, path, dest, stt, endt) AS (SELECT
src
, CAST('{"' src '-' dest '"}' AS nvarchar(max))
, CAST('{"' src '"}' AS nvarchar(max))
, dest
, stt
, endt
FROM flight
WHERE src = 'MSP'
UNION ALL
SELECT
fp.src
, fp.flights '{"' f.src '"}' '{"' f.dest '"}'
, fp.path '{"' f.src '"}'
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest AND f.dest <> fp.src
AND fp.dest <> 'LAX'
WHERE fp.path NOT LIKE '%' '{"' f.src '"}' '%'
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path stopovers
FROM flight_paths
WHERE dest = 'LAX';
Tested with 100003 rows, the results are :
- Charlieface Query 1 : 5' 05"
- Charlieface Query 2 : 4' 58"
- My own one : 0' 01"
Running on a Windows 2019 Developper / PC with 256 Go RAM and 24 cores x 2 intel Xeon CPU E5-1650 v4
Conclusion : JSON or XML are very costly in recursive queries...