Home > Software design >  Find matching rows in database table using SQL where no matching key is present
Find matching rows in database table using SQL where no matching key is present

Time:05-31

I have an old table with legacy data and approx 10,000 rows and a new table with about 500 rows. The columns are the same in both tables. I need to compare a few columns in the new table with the old one and report on data that is duplicated in the new table.

I've researched articles with similar issues, attempted table joins and where exists / where not exists clauses but I just can't get the SQL right. I have included my latest version.

One issue causing trouble for me, I think, is that there is no "Key" as such like a userid or similar unique identifier in either table.

What I want to do is find the data in the "new" table where all rows except for the "reference_number" (doesn't matter if it does or does not) is duplicated, i.e. exists already in the "old" table.

I have this so far...

select 
old.reference_number
new.reference_number
new.component
new.privileges
new.protocol
new.authority
new.score
new.means
new.difficulty
new.hierarchy
new.interaction
new.scope
new.conf
new.integrity
new.availability
new.version
from old, new
where
old.component = new.component
old.privileges = new.privileges
old.protocol = new.protocol
old.authority = new.authority
old.score = new.score
old.means = new.means
old.difficulty = new.difficulty
old.hierarchy = new.hierarchy
old.interaction = new.interaction
old.scope = new.scope
old.conf = new.conf
old.integrity = new.integrity
old.availability = new.availability
old.version = new.version

I have tried this here but it doesn't seem to pull out ALL of the data for some reason.

It is evident that actually there are MORE rows in the old table that are duplicated in the new table but I'm only getting a small number of rows returned from the query.

Can anyone spot why that might be, is there another way I should be approaching this?

If it matters, this is Postgresql.

Thanks for any help given.

CodePudding user response:

The following should do what you want:

select distinct o.reference_number,
                n.reference_number,
                n.component,
                n.privileges,
                n.protocol,
                n.authority,
                n.score,
                n.means,
                n.difficulty,
                n.hierarchy,
                n.interaction,
                n.scope,
                n.conf,
                n.integrity,
                n.availability,
                n.version
  from new n
  inner join old o
    on o.component = n.component and
       o.privileges = n.privileges and
       o.protocol = n.protocol and
       o.authority = n.authority and
       o.score = n.score and
       o.means = n.means and
       o.difficulty = n.difficulty and
       o.hierarchy = n.hierarchy and
       o.interaction = n.interaction and
       o.scope = n.scope and
       o.conf = n.conf and
       o.integrity = n.integrity and
       o.availability = n.availability and
       o.version = n.version

CodePudding user response:

You should use left join and then select only rows with new values is null. sql should be something like this:

select 
old.reference_number
new.reference_number
new.component
new.privileges
new.protocol
new.authority
new.score
new.means
new.difficulty
new.hierarchy
new.interaction
new.scope
new.conf
new.integrity
new.availability
new.version
from old 
  left join new
  on
old.component = new.component
old.privileges = new.privileges
old.protocol = new.protocol
old.authority = new.authority
old.score = new.score
old.means = new.means
old.difficulty = new.difficulty
old.hierarchy = new.hierarchy
old.interaction = new.interaction
old.scope = new.scope
old.conf = new.conf
old.integrity = new.integrity
old.availability = new.availability
old.version = new.version
where new.component is null
  • Related