Example data:
some_parent_id | some_parent_name | some_subset_name | some_subset_id | address |
---|---|---|---|---|
123456 | SPECIAL | special_shop | 9876 | 1234 road st |
null | null | special_shop | 9876 | 1234 road st |
654321 | NOT_SPECIAL | not_special_shop | 9877 | 1258 diff st |
654321 | NOT_SPECIAL | not_special_shop | 9877 | 1258 diff st |
I would like to find a query that returns only the below results - so only if there are 2 total records for some_subset_id, and only if one record has a null some_parent_id and the other record has a non-null some_parent_id.
some_parent_id | some_parent_name | some_subset_name | some_subset_id | address |
---|---|---|---|---|
123456 | SPECIAL | special_shop | 9876 | 1234 road st |
null | null | special_shop | 9876 | 1234 road st |
The table I'm working with has well over 2 million records, and there are intentional duplicates, but unfortunately no unique identifier for rows. I did not create the table.
CodePudding user response:
Try something like:
select
a.*
from
mytable as a
inner join
(
select
some_subset_id,
sum(case when some_parent_id is null then 0 else 1 end) as have_parent_id,
sum(case when some_parent_id is null then 1 else 0 end) as do_not_have_parent_id
from
mytable
group by
some_subset_id
having
sum(case when some_parent_id is null then 0 else 1 end) = 1
and sum(case when some_parent_id is null then 1 else 0 end) = 1
) as b on b.some_subset_id = a.some_subset_id;
CodePudding user response:
You could use analytic function in a subquery to filter your data (WHERE clause). The subquery defines the SELECTION column with values of IN or OUT so you can use it to fetch the rows marked as either of the values.
SELECT *
FROM tbl
WHERE SUBS_ID IN ( Select SUBS_ID
From ( Select SUBS_ID "SUBS_ID", CASE WHEN Count(SUBS_ID) OVER(PARTITION BY SUBS_ID) > 1 And P_ID Is Null THEN 'OUT' ELSE 'IN' END "SELECTION"
From tbl )
Group By SUBS_ID
Having Max(SELECTION) = 'OUT' )
ORDER BY SUBS_ID, P_ID NULLS LAST
With your sample data...
WITH
tbl AS
(
Select 123456 "P_ID", 'SPECIAL' "P_NAME", 'Special Shop' "SUBS_NAME", 9876 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
Select Null "P_ID", Null "P_NAME", 'Special Shop' "SUBS_NAME", 9876 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
Select 654321 "P_ID", 'NOT_SPECIAL' "P_NAME", 'Not Special Shop' "SUBS_NAME", 9877 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
Select 654321 "P_ID", 'NOT_SPECIAL' "P_NAME", 'Not Special Shop' "SUBS_NAME", 9877 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual
)
... and with ' Having Max(SELECTION) = 'OUT' ' result is:
P_ID | P_NAME | SUBS_NAME | SUBS_ID | ADDRESS |
---|---|---|---|---|
123456 | SPECIAL | Special Shop | 9876 | 1234, Road St. |
Special Shop | 9876 | 1234, Road St. |
... and with ' Having Max(SELECTION) = 'IN' ' result is:
P_ID | P_NAME | SUBS_NAME | SUBS_ID | ADDRESS |
---|---|---|---|---|
654321 | NOT_SPECIAL | Not Special Shop | 9877 | 1234, Road St. |
654321 | NOT_SPECIAL | Not Special Shop | 9877 | 1234, Road St. |