Home > Mobile >  Databricks, comparing two tables to see which records are missing
Databricks, comparing two tables to see which records are missing

Time:12-16

I'm looking into two tables that are supposed to be equal. I run this query to see which records are missing in table B against table A (we have a 3-columned key):

select * 
from tableA A 
left join TableB B 
   on A.joinField1 = B.joinField1 
  and A.joinField2 = B.joinField2 
  and A.joinField3 = B.joinField3 
where B.joinField1 is null 
   or B.joinField2 is null 
   or B.joinField3 is null 

This way, should a record in A be missing in B, it gets filtered in this query (based on the key). For some reason, when I randomly pick one of these missing records and look it up directly in table B (with a simple select, filtered on the key), it shows up. Why does my query include them when actually there is a match? there are no null values and fields formats match.

CodePudding user response:

We can use EXCEPT command for this . EXCEPT and EXCEPT ALL return the rows that are found in one relation but not the other. EXCEPT (alternatively, EXCEPT DISTINCT) takes only distinct rows while EXCEPT ALL does not remove duplicates from the result rows. Note that MINUS is an alias for EXCEPT. You can ref link

  • Related