I need assistance with my join and where condition.
I would like all records in hpd_help_desk
displayed.
Tables hpd_associations
and chg_infrastructure_change
have multiple records per hpd_help_desk.INCIDENT_NUMBER
so duplicate rows are being returned. Duplicates need to be removed.
Table hpd_associations
has only been added to access records in chg_infrastructure_change
Suggested Logic to remove duplicates:
From Table
chg_infrastructure_change
select only rows where fieldchg_request
has aCRQ
number ANDHPD_ASSOCIATIONS_Request
=17000
If there
hpd_associations.association_type01
has multiple rows perhpd_help_desk.INCIDENT_NUMBER
and nochg_request
CRQ's /HPD_ASSOCIATIONS_Request
=17000
, simply take the first row.
SQL:
SELECT distinct(hpd_help_desk.INCIDENT_NUMBER),
hpd_associations.association_type01 as "HPD_ASSOCIATIONS_Request",
chg_infrastructure_change.request_id as "CHG_Request",
chg_infrastructure_change.status_reason as "CHG_Status_Reason",
chg_infrastructure_change.description2 as "CHG__Desc2",
chg_infrastructure_change.infrastructure_change_id as "CHG_infrastructure_change_id"
FROM helix_access.hpd_help_desk
LEFT OUTER JOIN helix_access.hpd_associations ON (hpd_help_desk.incident_number = hpd_associations.request_id02)
LEFT OUTER JOIN helix_access.chg_infrastructure_change ON (chg_infrastructure_change.infrastructure_change_id = hpd_associations.request_id01)
WHERE hpd_associations.association_type01 = 17000 -- 17000 vale = "Caused By" There are 83 values for association_type01
Desired output would be as below, indication of keep and remoe rows included:
incident_number hpd_associations_request chg_request chg_status_reason chg_infrastructure_change_id
INC000037937498 35000 NULL NULL NULL --Keep
INC000037942103 17000 CRQ000000755311 6000 CRQ000000889056 --Keep
INC000037942103 35000 CRQ000000710522 5000 CRQ000000834307 -- Remove
INC000037942103 35000 NULL NULL NULL -- Remove
INC000037951750 17000 CRQ000000752354 5000 CRQ000000885032 -- Keep
INC000037984428 35000 NULL NULL NULL -- Keep
INC000037985037 35000 NULL NULL NULL -- Keep
INC000037985037 30000 NULL NULL NULL -- Remove
Alternatively should there just be a subquery to get all records from chg_infrastructure_change
where field chg_request
has a CRQ
number AND HPD_ASSOCIATIONS_Request
= 17000
and do a left join?
i.e.
WHERE
.
.
.
and hpd_help_desk.INCIDENT_NUMBER in (
SELECT
distinct(hpd_associations.request_id02)
-- hpd_associations.association_type01 as "HPD_ASSOCIATIONS_Request",
-- chg_infrastructure_change.request_id as "CHG_Request"
FROM helix_access.hpd_associations
LEFT OUTER JOIN helix_access.chg_infrastructure_change ON (chg_infrastructure_change.infrastructure_change_id = hpd_associations.request_id01)
WHERE From_unixtime(Cast(hpd_associations.start_date_01 AS BIGINT),'yyyy-MM-dd HH:mm:ss') >= '2022-03-01 00:00:00.000'
AND hpd_associations.association_type01 = 17000 -- 17000 vale = "Caused By" There are 83 values for association_type01
)
Peter
CodePudding user response:
When dealing with outer joins you need to cater for the possibility of NULLs in columns from those joined tables. This is especially true when it comes to the where clause
because if you don't allow for NULLs when referring to outer joined tables you create the conditions that equate to an inner join.
e.g.
SELECT
hpd_help_desk.INCIDENT_NUMBER
, hpd_associations.association_type01 AS "HPD_ASSOCIATIONS_Request"
, chg_infrastructure_change.request_id AS "CHG_Request"
, chg_infrastructure_change.status_reason AS "CHG_Status_Reason"
, chg_infrastructure_change.description2 AS "CHG__Desc2"
, chg_infrastructure_change.infrastructure_change_id AS "CHG_infrastructure_change_id"
FROM helix_access.hpd_help_desk
LEFT OUTER JOIN helix_access.hpd_associations
ON hpd_help_desk.incident_number = hpd_associations.request_id02
LEFT OUTER JOIN helix_access.chg_infrastructure_change
ON chg_infrastructure_change.infrastructure_change_id = hpd_associations.request_id01
WHERE (hpd_associations.association_type01 = 17000
OR hpd_associations.association_type01 IS NULL)
Also, an alternative to using the where clause you can extend the join conditions
instead e.g.
LEFT OUTER JOIN helix_access.hpd_associations
ON hpd_help_desk.incident_number = hpd_associations.request_id02
AND hpd_associations.association_type01 = 17000
LEFT OUTER JOIN helix_access.chg_infrastructure_change
ON chg_infrastructure_change.infrastructure_change_id = hpd_associations.request_id01
AND chg_infrastructure_change.CRQ IS NOT NULL
AND chg_infrastructure_change.HPD_ASSOCIATIONS_Request = 17000
This will limit the rows included in the result to just those rows that meet ALL conditions in each join.
Regarding "If there hpd_associations.association_type01 has multiple rows per hpd_help_desk.INCIDENT_NUMBER and no chg_request CRQ's / HPD_ASSOCIATIONS_Request = 17000, simply take the first row." This may require use of row_number() over() and to do that would require you to join to a subquery instead of joining diretly to the table. e.g.
LEFT OUTER JOIN (
SELECT *
, row_number() over(partition by request_id02
order by CRQ) as rn
FROM helix_access.hpd_associations
AND hpd_associations.association_type01 = 17000
) AS assocs ON hpd_help_desk.incident_number = assocs.request_id02
AND assocs.rn = 1 /* this is how you get just one row (for each request_id02) */