Home > Back-end >  Query to return nonmatching lines in two arbitrary tables
Query to return nonmatching lines in two arbitrary tables

Time:08-19

I have two sets of tables (i.e. a.1, a.2, a.3, b.1, b.2, b.3, etc) created using slightly different logic. The analogous table in the two schemas have the exact same columns (i.e. a.1 has the same columns as b.1). My belief is that the tables in the two schemas should contain the exact same information, but I want to test that belief. Therefore I want to write a query that compares two analogous tables and returns lines that are not in both tables. Is there an easy way to write a query to do that without manually writing the join? In other words, can I have a query that can produce the results that I want where I only have to change the table names I want to compare while leaving the rest of the query unchanged?

To be a bit more explicit, I'm looking to do something like the following:

select * 
from a.1 
where (all columns in a.1) not in (select * from b.1);

If I could write something like this then all I would have to do to compare a.2 to b.2 would be to change the table names. However, it's not clear to me how to come up with the (all columns in a.1) piece in a general way.

Based on a recommendation in the comments, I've created the following showing the kind of thing I'd like to see:

https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=ad0141b0daf8f8f92e6e3fa8d57e67ad

CodePudding user response:

If you have a primary key to match rows between the tables, then you can try a full anti-join. For example:

select a.id as aid, b.id as bid
from a
full join b on b.id = a.id
where a.id is null or b.id is null

If the tables are:

A: 1, 2, 3
B: 1, 2, 4

The result is:

AID  BID
---- ----
null    4   -- means it's present in B, but not in A
   3 null   -- means it's present in A, but not in B

See running example at db<>fiddle.

Of course, if your tables do not have a primary key, or if the rows are inconsistent (same PK, different data), then you'll need to adjust the query.

CodePudding user response:

As an alternative you can try this:

select 'a1' t,* from (
select a1.*,row_number() over (partition by c1 order by 1) as rn from a1
minus
select b1.*,row_number() over (partition by c1 order by 1) as rn from b1
)
union all
select 'b1' t,* from (
select b1.*,row_number() over (partition by c1 order by 1) as rn from b1
minus
select a1.*,row_number() over (partition by c1 order by 1) as rn from a1
) 

fiddle

edit: you can shorten the query by precalculating the rn part, instead of doing the same calculation again.

CodePudding user response:

I was looking for the except clause.

So

select * 
from a.1 
where (all columns in a.1) not in (select * from b.1);

can be written as

select * from a.1 
except
select * from b.1

In db-fiddle I give an explicit exmaple of what I wanted.

  • Related