I have 2 similar tables as shown below with minor difference between some cells
Table A
Roll_ID | FirstName | LastName | Age |
---|---|---|---|
1 | AAA | XXX | 31 |
2 | BBB | YYY | 32 |
3 | CCC | ZZZ | 33 |
Table B
Roll_ID | FirstName | LastName | Age |
---|---|---|---|
1 | AAA | XXX | 35 |
2 | PPP | YYY | 36 |
3 | QQQ | WWW | 37 |
I would like to get an output that shows the count of different records on a per-column level.
For example the output of the query for the above scenario should be
Output
Roll_ID | FirstName | LastName | Age |
---|---|---|---|
0 | 2 | 1 | 3 |
For this question we can assume that there will always be one column which will have non-null unique values (or one column which may be primary key). In above example Roll_ID is such a column.
My question is: What would be the most efficient way to get such an output? Is there anything to keep in mind when running such query for tables that may have millions of records from point of view of efficiency?
CodePudding user response:
First you have to join the tables
SELECT *
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
Now just add the counts
SELECT
SUM(CASE WHEN table1.FirstName <> table2.FirstName THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN table1.LastName <> table2.LastName THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN table1.Age <> table2.Age THEN 1 ELSE 0 END) as AgeDiff
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
If an id not existing in both tables is considered "different" then you would need something like this
SELECT
SUM(CASE WHEN COALESCE(table1.FirstName,'x') <> COALESCE(table2.FirstName,'y') THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN COALESCE(table1.LastName,'x') <> COALESCE(table2.LastName,'y') THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN COALESCE(table1.Age,-1) <> COALESCE(table2.Age,-2) THEN 1 ELSE 0 END) as AgeDiff
FROM ( SELECT table1.Roll_id FROM table1
UNION ALL
SELECT table2.Roll_id FROM table2
) base
JOIN table1 on table1.ROLL_ID = base.ROLL_ID
JOIN table2 on table2.ROLL_ID = base.ROLL_ID
Here we get all the roll_ids and then left join back to the tables. This is much better than a cross join if the roll_id column is indexed.
CodePudding user response:
SELECT SUM(IIF(ISNULL(A.FirstName, '') <> ISNULL(B.FirstName, ''), 1, 0)) AS FirstNameRecordDiff,
SUM(IIF(ISNULL(A.LastName, '') <> ISNULL(B.LastName, ''), 1, 0)) AS LastNameRecordDiff,
SUM(IIF(ISNULL(A.Age, 0) <> ISNULL(B.Age, 0), 1, 0)) AS LastNameRecordDiff
FROM A
FULL OUTER JOIN B
ON B.Roll_ID = A.Roll_ID;
This query intentionally allows nulls to equal, assuming that a lack of data would mean the same thing to the end user.
As written, it would only work on SQL Server. To use it for MySQL or Oracle, the query would vary.