SQL - Select rows from join result that contain unique column pairs-CodePudding

I'm running a query where I want to ignore most of the returned rows (checking sets of results with Excel/visually suggests that the rows I'm looking for typically comprise about 0.1 - 0.2% of the row total so far). A simplified version of the current query:

SELECT
  in.source,
  in.version,

  r.report_id,
  r.status,

FROM report r
  LEFT JOIN inbound in ON r.ref = in.id

WHERE 1=1
--  AND various search conditions

This will produce results like

SOURCE	VERSION	REPORT_ID	STATUS
A	1	123	DONE
B	1	123	DONE
A	0	456	CANC
B	0	456	CLOSED
B	3	789	DONE
A	5	789	DONE

Ideally, I'd like to modify the query so that what is returned is any row where the pair (VERSION, REPORT_ID) occurs precisely once. So in the above example, the only rows that should be in the result are

B  3  789  DONE
A  5  789  DONE

because no other rows exist that have VERSION = 3, REPORT_ID = 789 or VERSION = 5, REPORT_ID = 789, respectively.

Another way to describe what I need, is that for every row from SOURCE A with VERSION = X, REPORT_ID = Y, there should be a row from SOURCE B with VERSION = X, REPORT_ID = Y. I'm interested in the rows that break this requirement, e.g. there exists a row from one source with a (VERSION, REPORT_ID) pair such that no row from the other source with a matching pair exists.

Thank you in advance.

CodePudding user response：

One option is to use result of your current query (ycq in example that follows) as a "source" for yet another query; using the count function in its analytic form, calculate which (version, report_id) combinations appears only once.

SQL> with ycq (source, version, report_id, status) as
  2    (select 'A', 1, 123, 'DONE'   from dual union all
  3     select 'B', 1, 123, 'DONE'   from dual union all
  4     select 'A', 0, 456, 'CANC'   from dual union all
  5     select 'B', 0, 456, 'CLOSED' from dual union all
  6     select 'B', 3, 789, 'DONE'   from dual union all
  7     select 'A', 5, 789, 'DONE'   from dual
  8     --
  9     union all
 10     select 'A', 1, 123, 'XXX' from dual
 11    )
 12  select source, version, report_id, status
 13  from (select y.*,
 14               count(*) over (partition by version, report_id) cnt
 15        from ycq y
 16       )
 17  where cnt = 1;

SOURCE        VERSION  REPORT_ID STATUS
---------- ---------- ---------- ------
B                   3        789 DONE
A                   5        789 DONE

SQL>

Or, using your query, literally:

with ycq (source, version, report_id, status) as
  (SELECT
      in.source,
      in.version,
      r.report_id,
      r.status,
    FROM report r
      LEFT JOIN inbound in ON r.ref = in.id
    WHERE 1=1
    --  AND various search conditions
  )
select source, version, report_id, status
from (select y.*,
             count(*) over (partition by version, report_id) cnt
      from ycq y
     )
where cnt = 1;

CodePudding user response：

Query by comparing yourself with yourself

with rin (SELECT
      in.source,
      in.version,
      r.report_id,
      r.status,
    FROM report r LEFT JOIN inbound in ON r.ref = in.id
    WHERE 1=1
 --  AND various search conditions
   )
select * from rin y where not exists(select * from rin y2 where y2.source<>y.source and y2.version=y.version and y2.report_id=y.report_id)