Home > Blockchain >  SQL join to include all records on the left most table and only show records from the two right tabl
SQL join to include all records on the left most table and only show records from the two right tabl

Time:07-02

I need assistance with my join and where condition.

I would like all records in hpd_help_desk displayed.

Tables hpd_associations and chg_infrastructure_change have multiple records per hpd_help_desk.INCIDENT_NUMBER so duplicate rows are being returned. Duplicates need to be removed.

Table hpd_associations has only been added to access records in chg_infrastructure_change

Suggested Logic to remove duplicates:

  1. From Table chg_infrastructure_change select only rows where field chg_request has a CRQ number AND HPD_ASSOCIATIONS_Request = 17000

  2. If there hpd_associations.association_type01 has multiple rows per hpd_help_desk.INCIDENT_NUMBER and no chg_request CRQ's / HPD_ASSOCIATIONS_Request = 17000, simply take the first row.

SQL:

SELECT  distinct(hpd_help_desk.INCIDENT_NUMBER),
        
        hpd_associations.association_type01                 as "HPD_ASSOCIATIONS_Request",
        
        chg_infrastructure_change.request_id                as "CHG_Request",
        chg_infrastructure_change.status_reason             as "CHG_Status_Reason",
        chg_infrastructure_change.description2              as "CHG__Desc2",  
        chg_infrastructure_change.infrastructure_change_id  as "CHG_infrastructure_change_id"  
    
FROM    helix_access.hpd_help_desk
        LEFT OUTER JOIN helix_access.hpd_associations ON (hpd_help_desk.incident_number = hpd_associations.request_id02)
        LEFT OUTER JOIN helix_access.chg_infrastructure_change ON (chg_infrastructure_change.infrastructure_change_id  = hpd_associations.request_id01)                                            

WHERE   hpd_associations.association_type01 = 17000 -- 17000 vale = "Caused By" There are 83 values for association_type01

Desired output would be as below, indication of keep and remoe rows included:

incident_number hpd_associations_request    chg_request chg_status_reason   chg_infrastructure_change_id
INC000037937498 35000   NULL    NULL    NULL --Keep
INC000037942103 17000   CRQ000000755311 6000    CRQ000000889056 --Keep
INC000037942103 35000   CRQ000000710522 5000    CRQ000000834307 -- Remove
INC000037942103 35000   NULL    NULL    NULL -- Remove
INC000037951750 17000   CRQ000000752354 5000    CRQ000000885032 -- Keep
INC000037984428 35000   NULL    NULL    NULL -- Keep
INC000037985037 35000   NULL    NULL    NULL -- Keep
INC000037985037 30000   NULL    NULL    NULL -- Remove

Alternatively should there just be a subquery to get all records from chg_infrastructure_change where field chg_request has a CRQ number AND HPD_ASSOCIATIONS_Request = 17000 and do a left join?

i.e.

WHERE
.
.
.
        and hpd_help_desk.INCIDENT_NUMBER in (
                                                SELECT  
                                                        
                                                        distinct(hpd_associations.request_id02)
                                                     --   hpd_associations.association_type01                 as "HPD_ASSOCIATIONS_Request",
                                                       -- chg_infrastructure_change.request_id                as "CHG_Request"
                                                        
                                                FROM    helix_access.hpd_associations 
                                                            LEFT OUTER JOIN helix_access.chg_infrastructure_change ON (chg_infrastructure_change.infrastructure_change_id  = hpd_associations.request_id01)                                            
                                                
                                                WHERE   From_unixtime(Cast(hpd_associations.start_date_01 AS BIGINT),'yyyy-MM-dd HH:mm:ss') >= '2022-03-01 00:00:00.000'
                                                        AND hpd_associations.association_type01 = 17000 -- 17000 vale = "Caused By" There are 83 values for association_type01
                                              )          

Peter

CodePudding user response:

When dealing with outer joins you need to cater for the possibility of NULLs in columns from those joined tables. This is especially true when it comes to the where clause because if you don't allow for NULLs when referring to outer joined tables you create the conditions that equate to an inner join.

e.g.

SELECT
      hpd_help_desk.INCIDENT_NUMBER
    , hpd_associations.association_type01 AS "HPD_ASSOCIATIONS_Request"
    , chg_infrastructure_change.request_id AS "CHG_Request"
    , chg_infrastructure_change.status_reason AS "CHG_Status_Reason"
    , chg_infrastructure_change.description2 AS "CHG__Desc2"
    , chg_infrastructure_change.infrastructure_change_id AS "CHG_infrastructure_change_id"
FROM helix_access.hpd_help_desk
LEFT OUTER JOIN helix_access.hpd_associations
    ON hpd_help_desk.incident_number = hpd_associations.request_id02
LEFT OUTER JOIN helix_access.chg_infrastructure_change
    ON chg_infrastructure_change.infrastructure_change_id = hpd_associations.request_id01
WHERE (hpd_associations.association_type01 = 17000 
         OR hpd_associations.association_type01 IS NULL)

Also, an alternative to using the where clause you can extend the join conditions instead e.g.

LEFT OUTER JOIN helix_access.hpd_associations
    ON hpd_help_desk.incident_number = hpd_associations.request_id02
    AND hpd_associations.association_type01 = 17000
LEFT OUTER JOIN helix_access.chg_infrastructure_change
   ON chg_infrastructure_change.infrastructure_change_id  = hpd_associations.request_id01
   AND chg_infrastructure_change.CRQ IS NOT NULL
   AND chg_infrastructure_change.HPD_ASSOCIATIONS_Request = 17000

This will limit the rows included in the result to just those rows that meet ALL conditions in each join.

Regarding "If there hpd_associations.association_type01 has multiple rows per hpd_help_desk.INCIDENT_NUMBER and no chg_request CRQ's / HPD_ASSOCIATIONS_Request = 17000, simply take the first row." This may require use of row_number() over() and to do that would require you to join to a subquery instead of joining diretly to the table. e.g.

LEFT OUTER JOIN (
    SELECT *
    , row_number() over(partition by request_id02
                        order by CRQ) as rn
    FROM helix_access.hpd_associations
    AND hpd_associations.association_type01 = 17000
    ) AS assocs ON hpd_help_desk.incident_number = assocs.request_id02
                AND assocs.rn = 1 /* this is how you get just one row (for each request_id02) */
  • Related