I'm running a query where I want to ignore most of the returned rows (checking sets of results with Excel/visually suggests that the rows I'm looking for typically comprise about 0.1 - 0.2% of the row total so far). A simplified version of the current query:
SELECT
in.source,
in.version,
r.report_id,
r.status,
FROM report r
LEFT JOIN inbound in ON r.ref = in.id
WHERE 1=1
-- AND various search conditions
This will produce results like
SOURCE | VERSION | REPORT_ID | STATUS |
---|---|---|---|
A | 1 | 123 | DONE |
B | 1 | 123 | DONE |
A | 0 | 456 | CANC |
B | 0 | 456 | CLOSED |
B | 3 | 789 | DONE |
A | 5 | 789 | DONE |
Ideally, I'd like to modify the query so that what is returned is any row where the pair (VERSION, REPORT_ID)
occurs precisely once. So in the above example, the only rows that should be in the result are
B 3 789 DONE
A 5 789 DONE
because no other rows exist that have VERSION = 3, REPORT_ID = 789
or VERSION = 5, REPORT_ID = 789
, respectively.
Another way to describe what I need, is that for every row from SOURCE A
with VERSION = X, REPORT_ID = Y
, there should be a row from SOURCE B
with VERSION = X, REPORT_ID = Y
. I'm interested in the rows that break this requirement, e.g. there exists a row from one source with a (VERSION, REPORT_ID)
pair such that no row from the other source with a matching pair exists.
Thank you in advance.
CodePudding user response:
One option is to use result of your current query (ycq
in example that follows) as a "source" for yet another query; using the count
function in its analytic form, calculate which (version, report_id
) combinations appears only once.
SQL> with ycq (source, version, report_id, status) as
2 (select 'A', 1, 123, 'DONE' from dual union all
3 select 'B', 1, 123, 'DONE' from dual union all
4 select 'A', 0, 456, 'CANC' from dual union all
5 select 'B', 0, 456, 'CLOSED' from dual union all
6 select 'B', 3, 789, 'DONE' from dual union all
7 select 'A', 5, 789, 'DONE' from dual
8 --
9 union all
10 select 'A', 1, 123, 'XXX' from dual
11 )
12 select source, version, report_id, status
13 from (select y.*,
14 count(*) over (partition by version, report_id) cnt
15 from ycq y
16 )
17 where cnt = 1;
SOURCE VERSION REPORT_ID STATUS
---------- ---------- ---------- ------
B 3 789 DONE
A 5 789 DONE
SQL>
Or, using your query, literally:
with ycq (source, version, report_id, status) as
(SELECT
in.source,
in.version,
r.report_id,
r.status,
FROM report r
LEFT JOIN inbound in ON r.ref = in.id
WHERE 1=1
-- AND various search conditions
)
select source, version, report_id, status
from (select y.*,
count(*) over (partition by version, report_id) cnt
from ycq y
)
where cnt = 1;
CodePudding user response:
Query by comparing yourself with yourself
with rin (SELECT
in.source,
in.version,
r.report_id,
r.status,
FROM report r LEFT JOIN inbound in ON r.ref = in.id
WHERE 1=1
-- AND various search conditions
)
select * from rin y where not exists(select * from rin y2 where y2.source<>y.source and y2.version=y.version and y2.report_id=y.report_id)