How do I optimize hierarchical query-CodePudding

I've been trying to run my query but it always seem to be running like forever. Would like to know how can I Improve on performance for this Had to add in the group by Clause as the left join seem to be creating duplication when viewing the results.

Hierarchical Query Code

with NextValue (equip_id, link_id, node_fr_id, node_to_id ) as
(
    SELECT lm.equipment_id, l.link_id, l.node_fr_id_num, l.node_to_id_num
    FROM road_details l
    LEFT JOIN Equipment_details e ON l.link_id = e.link_id
    Where 1=1
    START WITH l.link_id = 2
    CONNECT By NOCYCLE Prior l.node_to_id_num = l.node_fr_id_num  
)
Select * from NextValue 

-- Select Particular Equipment starting with 'DET' --
Where 1=1 and equip_id like 'DET%'

-- Remove Duplicate cause by Join clause --
Group by  equip_id, link_id, node_fr_id, node_to_id;

Expected output:

equip_id	link_id	node_fr_id	node_to_id
DET_276627	990	1138	1141
DET_546012	1881	4856	2322
DET_546010	1980	2397	2437
DET_526526	13176	14259	14260
DET_526014	13724	14762	14763
DET_536513	114721	114445	114446
DET_526524	106130	105001	106402
DET_516043	18803	19196	19198
DET_526024	106319	106753	105919
DET_526515	105796	105762	103751
DET_526518	1061319	106514	104601
DET_526037	99221	96871	94382
DET_526510	99185	97133	97881

Current output:

equip_id	link_id	node_fr_id	node_to_id
DET_276627	990	1138	1141
DET_546012	1881	4856	2322
DET_546010	1980	2397	2437
DET_526526	13176	14259	14260
DET_526014	13724	14762	14763
DET_536513	114721	114445	114446
DET_526524	106130	105001	106402
DET_516043	18803	19196	19198
DET_526024	106319	106753	105919
DET_526515	105796	105762	103751
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526518	1061319	106514	104601
DET_526037	99221	96871	94382
DET_526510	99185	97133	97881
DET_526510	99185	97133	97881
DET_526510	99185	97133	97881
DET_526510	99185	97133	97881

for odd reason it creating duplicates and hence had to add in the Group clause to counter this issue.. but with the group clause added in i've notice the longer querying duration occuring.

Updates for commenters (not enough space to explain): Hi @HimanshuKandpal, let me just explain the query and what it is suppose to do. There are 2 tables, one is road_details and the other is an equipment_details. PK for the road detailsis Link_id, where equipment_id is the PK in the equipment_details and link_id is FK.

Brief explanation road details for context of what this hieratical query is aiming to do. Picture a map of inter connecting roads, e.g. a single road is made up of multiple links (Blocks) each block has a fr_id (head) and to_id (tail) [link_id 1 -> link_id 2 -> link_id 3] where link_id1.to_id = link_id2.fr_id as they are linked that way. The query is supposed to find out what is the next connecting link based on the criteria.

2nd criteria, with each link can have multiple equipment on it or non at all. If a user were to select a certain equipment type, as shown in my where clause are they able to know what is the next or previous link ID of a particular equipment with support from hieratical query.

CodePudding user response：

The columns you SELECT are all from the road_details table (assuming lm.equipment_id is a typo as there is no lm table alias and it should be l.equipment_id) and the WHERE filter is a column from the road_details table and the START WITH and CONNECT BY conditions are all using columns from the road_details table.

You LEFT OUTER JOIN the Equipment_details table which means that having matching rows is optional and you NEVER use any of its columns to either filter the input or generate the output so the Equipment_details table appears to be irrelevant to the output. At best, the SQL engine will optimise query and remove the join; at worst, there are multiple matching rows in the Equipment_details table for each row in the road_details table which will generate duplicates and then this number of duplicates will be exponentially magnified as the paths are followed deeper into the hierarchy.

For example, if you have the path without equipment:

A -> B -> C

And compare it to having the same path with 2 equipment on each road:

A (with equipment A1) -> B (with equipment B1) -> C (with equipment C1)
A (with equipment A1) -> B (with equipment B1) -> C (with equipment C2)
A (with equipment A1) -> B (with equipment B2) -> C (with equipment C1)
A (with equipment A1) -> B (with equipment B2) -> C (with equipment C2)
A (with equipment A2) -> B (with equipment B1) -> C (with equipment C1)
A (with equipment A2) -> B (with equipment B1) -> C (with equipment C2)
A (with equipment A2) -> B (with equipment B2) -> C (with equipment C1)
A (with equipment A2) -> B (with equipment B2) -> C (with equipment C2)

The query will generate 8 duplicates that terminate a C (and it will get worse as the hierarchy deepens or if multiple paths reach the same junction).

You will still need to filter out duplicates as you can follow roads A -> B -> C and also A -> D -> C so C would be reached by two distinct paths without any duplication from equipment but including equipment will make the duplicates much, much more prevalent.

Since you never use the Equipment_details table, do not include it in the query and just use the road_details table:

SELECT DISTINCT
       l.equipment_id,
       l.link_id,
       l.node_fr_id_num,
       l.node_to_id_num
FROM   road_details l
WHERE  l.equipment_id like 'DET%'
START WITH
       l.link_id = 2
CONNECT BY NOCYCLE
       PRIOR l.node_to_id_num = l.node_fr_id_num

(Note: using either GROUP BY with all the columns or DISTINCT will be doing the same job and performance should be similar for both and can be used interchangeably; you should profile each version and check. DISTINCT may be easier for future developers to comprehend what you are trying to achieve.)

If the equipment_id column is in the Equipment_details table then you can generate the road hierarchy and then later join the equipment:

SELECT e.equipment_id,
       l.*
FROM   ( SELECT DISTINCT
                link_id,
                node_fr_id_num,
                node_to_id_num
         FROM   road_details
         START WITH
                link_id = 2
         CONNECT BY NOCYCLE
                PRIOR node_to_id_num = node_fr_id_num
       ) l
       INNER JOIN Equipment_Details e
       ON (l.link_id = e.link_id)
WHERE  e.equipment_id like 'DET%'

or:

SELECT e.equipment_id,
       l.*
FROM   ( SELECT DISTINCT
                link_id,
                node_fr_id_num,
                node_to_id_num
         FROM   road_details
         START WITH
                link_id = 2
         CONNECT BY NOCYCLE
                PRIOR node_to_id_num = node_fr_id_num
       ) l
       LEFT OUTER JOIN Equipment_Details e
       ON (   l.link_id = e.link_id
          AND e.equipment_id like 'DET%')