Home > Blockchain >  SQL - Get per column count of differences when comparing two tables
SQL - Get per column count of differences when comparing two tables

Time:07-07

I have 2 similar tables as shown below with minor difference between some cells

Table A

Roll_ID FirstName LastName Age
1 AAA XXX 31
2 BBB YYY 32
3 CCC ZZZ 33

Table B

Roll_ID FirstName LastName Age
1 AAA XXX 35
2 PPP YYY 36
3 QQQ WWW 37

I would like to get an output that shows the count of different records on a per-column level.

For example the output of the query for the above scenario should be

Output

Roll_ID FirstName LastName Age
0 2 1 3

For this question we can assume that there will always be one column which will have non-null unique values (or one column which may be primary key). In above example Roll_ID is such a column.

My question is: What would be the most efficient way to get such an output? Is there anything to keep in mind when running such query for tables that may have millions of records from point of view of efficiency?

CodePudding user response:

First you have to join the tables

   SELECT *
   FROM table1
   JOIN table2 on table1.ROLL_ID = table2.ROLL_ID

Now just add the counts

   SELECT 
      SUM(CASE WHEN table1.FirstName <> table2.FirstName THEN 1 ELSE 0 END) as FirstNameDiff,
      SUM(CASE WHEN table1.LastName <> table2.LastName THEN 1 ELSE 0 END) as LastNameDiff,
      SUM(CASE WHEN table1.Age <> table2.Age THEN 1 ELSE 0 END) as AgeDiff
   FROM table1
   JOIN table2 on table1.ROLL_ID = table2.ROLL_ID

If an id not existing in both tables is considered "different" then you would need something like this

   SELECT 
      SUM(CASE WHEN COALESCE(table1.FirstName,'x') <> COALESCE(table2.FirstName,'y') THEN 1 ELSE 0 END) as FirstNameDiff,
      SUM(CASE WHEN COALESCE(table1.LastName,'x') <> COALESCE(table2.LastName,'y') THEN 1 ELSE 0 END) as LastNameDiff,
      SUM(CASE WHEN COALESCE(table1.Age,-1) <> COALESCE(table2.Age,-2) THEN 1 ELSE 0 END) as AgeDiff
   FROM ( SELECT table1.Roll_id FROM table1 
          UNION ALL
          SELECT table2.Roll_id FROM table2  
   ) base
   JOIN table1 on table1.ROLL_ID = base.ROLL_ID
   JOIN table2 on table2.ROLL_ID = base.ROLL_ID

Here we get all the roll_ids and then left join back to the tables. This is much better than a cross join if the roll_id column is indexed.

CodePudding user response:

SELECT SUM(IIF(ISNULL(A.FirstName, '') <> ISNULL(B.FirstName, ''), 1, 0)) AS FirstNameRecordDiff,
       SUM(IIF(ISNULL(A.LastName, '') <> ISNULL(B.LastName, ''), 1, 0)) AS LastNameRecordDiff,
       SUM(IIF(ISNULL(A.Age, 0) <> ISNULL(B.Age, 0), 1, 0)) AS LastNameRecordDiff
FROM A
    FULL OUTER JOIN B
        ON B.Roll_ID = A.Roll_ID;

This query intentionally allows nulls to equal, assuming that a lack of data would mean the same thing to the end user.

As written, it would only work on SQL Server. To use it for MySQL or Oracle, the query would vary.

  • Related