Home > Blockchain >  How to identify records where one ID column matches in both rows, and another column is populated in
How to identify records where one ID column matches in both rows, and another column is populated in

Time:11-30

Example data:

some_parent_id some_parent_name some_subset_name some_subset_id address
123456 SPECIAL special_shop 9876 1234 road st
null null special_shop 9876 1234 road st
654321 NOT_SPECIAL not_special_shop 9877 1258 diff st
654321 NOT_SPECIAL not_special_shop 9877 1258 diff st

I would like to find a query that returns only the below results - so only if there are 2 total records for some_subset_id, and only if one record has a null some_parent_id and the other record has a non-null some_parent_id.

some_parent_id some_parent_name some_subset_name some_subset_id address
123456 SPECIAL special_shop 9876 1234 road st
null null special_shop 9876 1234 road st

The table I'm working with has well over 2 million records, and there are intentional duplicates, but unfortunately no unique identifier for rows. I did not create the table.

CodePudding user response:

Try something like:

select
   a.*
from
   mytable as a
   inner join
   (
   select
      some_subset_id,
      sum(case when some_parent_id is null then 0 else 1 end) as have_parent_id,
      sum(case when some_parent_id is null then 1 else 0 end) as do_not_have_parent_id
   from
      mytable
   group by
      some_subset_id
   having
      sum(case when some_parent_id is null then 0 else 1 end) = 1
      and sum(case when some_parent_id is null then 1 else 0 end) = 1
   ) as b on b.some_subset_id = a.some_subset_id;

CodePudding user response:

You could use analytic function in a subquery to filter your data (WHERE clause). The subquery defines the SELECTION column with values of IN or OUT so you can use it to fetch the rows marked as either of the values.

SELECT  *
FROM    tbl
WHERE   SUBS_ID IN  (   Select  SUBS_ID
                        From (  Select  SUBS_ID "SUBS_ID", CASE WHEN Count(SUBS_ID) OVER(PARTITION BY SUBS_ID) > 1 And P_ID Is Null THEN 'OUT' ELSE 'IN' END "SELECTION"
                                From    tbl  )
                        Group By  SUBS_ID
                        Having    Max(SELECTION) = 'OUT' )
ORDER BY  SUBS_ID, P_ID NULLS LAST

With your sample data...

WITH
    tbl AS
        (
            Select 123456 "P_ID", 'SPECIAL'     "P_NAME", 'Special Shop'     "SUBS_NAME", 9876 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
            Select Null   "P_ID",  Null         "P_NAME", 'Special Shop'     "SUBS_NAME", 9876 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
            Select 654321 "P_ID", 'NOT_SPECIAL' "P_NAME", 'Not Special Shop' "SUBS_NAME", 9877 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual Union All
            Select 654321 "P_ID", 'NOT_SPECIAL' "P_NAME", 'Not Special Shop' "SUBS_NAME", 9877 "SUBS_ID", '1234, Road St.' "ADDRESS" From Dual
        )

... and with ' Having Max(SELECTION) = 'OUT' ' result is:

P_ID P_NAME SUBS_NAME SUBS_ID ADDRESS
123456 SPECIAL Special Shop 9876 1234, Road St.
Special Shop 9876 1234, Road St.

... and with ' Having Max(SELECTION) = 'IN' ' result is:

P_ID P_NAME SUBS_NAME SUBS_ID ADDRESS
654321 NOT_SPECIAL Not Special Shop 9877 1234, Road St.
654321 NOT_SPECIAL Not Special Shop 9877 1234, Road St.
  • Related