I have a data set as following. To keep it simple I am showing only one column i.e qty but there are multiples for example (dates, prices etc). if any of the value within a single row is a mismatch then that row should show up as output along with the values from both sources, for example:
source 1
id qty
1 100
2 null
3 0
4 50
source 2
id_ qty_
1 100
2 80
3 100
expected output:
id qty id_ qty_
2 null 2 80
3 0 3 100
4 50 null null
What i am trying to achieve is to fetch all the rows from source 2 that doesn't match with the source 1. Source 1 is taken as main source/table.
In the end the output should show only row 2,3,4 with both values so that i can build a boolean saying:
if (qty != qty_ , false, true) as is_qty_matching
An example with multiple values:
source 1
id qty price
1 100 10
2 null 0
3 0 0
4 50 0
5 100 30
source 2
id_ qty_ price
1 100 10
2 80 0
3 100 0
5 100 33
expected output:
id qty price id_ qty_ price_
2 null 0 2 80 0
3 0 0 3 100 0
4 50 0 null null 0
5 100 30 5 100 33
In this case now id 5 has price mismatch
so my two new columns (is_price_matching) would show false for id 5 while true for other id', while the column (is_qty_matching) will show true for id 5 and false for others.
Similarly i will have multiple other columns and therefore multiple other boolean fields for each of the mismatch entry for example is_date_matching, is_time_matching etc.
what's important is that:
- The values can be null in either of source for any of the column, this might mess up the boolean maybe
- only the rows that has even one single mismatch value should show up in the final output, if there is no mismatch in any of the column values it should not show up
I have tried the following query:
with main as (
select 1 as id , 100 as qty , 50 as price
union all
select 2 as id , 0 as qty , 100 as price
union all
select 3 as id , 0 as qty ,80 as price
union all
select 4 as id , 50 as qty , 90 as price
union all
select 5 as id , 20 as qty , 100 as price
union all
select 6 as id , 20 as qty , 100 as price
),
main2 as (
select 1 as id_, 100 as qty_ , 50 as price_
union all
select 2 as id_, 80 as qty_ , 100 as price_
union all
select 3 as id_, 100 as qty_ , 80 as price_
union all
select 5 as id_, 20 as qty_ , 100 as price_
union all
select 6 as id_, 20 as qty_ , 40 as price_
)
select
main.*,
main2.*
from main
left join main2
on (main.id = main2.id_)
WHERE coalesce(qty,0) != coalesce(qty_,0)
or coalesce(main.price,0) != coalesce(main2.price_,0)
it seems to work but i was wondering if there is a better solution ? plus if i add the following line of code
if(qty != qty_ , true , false) as is_qty_mismatch
it will return incorrect output where the values are null
CodePudding user response:
There are 2 things that your query doesn't account for.
- The second table having ID that's not present in 1st table
- You can't compare null to other values (it gives you null).
To fix the best problem changing join from left to full should work (row 31), for the second wrapping your logic with ifnull
function. Also if the output for if
functions are boolean values, than you can skip the function and keep just logical statement (rows 28 and 29)
with main as (
select 1 as id , 100 as qty , 50 as price
union all
select 2 as id , 0 as qty , 100 as price
union all
select 3 as id , 0 as qty ,80 as price
union all
select 4 as id , 50 as qty , 90 as price
union all
select 5 as id , 20 as qty , 100 as price
union all
select 6 as id , 20 as qty , 100 as price
),
main2 as (
select 1 as id_, 100 as qty_ , 50 as price_
union all
select 2 as id_, 80 as qty_ , 100 as price_
union all
select 3 as id_, 100 as qty_ , 80 as price_
union all
select 5 as id_, 20 as qty_ , 100 as price_
union all
select 6 as id_, 20 as qty_ , 40 as price_
)
select
main.*,
main2.*,
ifnull(qty != qty_ , true) as is_qty_mismatch,
ifnull(price != price_ , true) as is_qty_mismatch,
from main
full join main2
on (main.id = main2.id_)
where coalesce(qty,0) != coalesce(qty_,0)
or coalesce(main.price,0) != coalesce(main2.price_,0)
CodePudding user response:
Below might be an another option,
SELECT * FROM (
SELECT * FROM (SELECT *, '1' src FROM main EXCEPT DISTINCT SELECT *, '1' src FROM main2)
UNION ALL
SELECT * FROM (SELECT *, '2' src FROM main2 EXCEPT DISTINCT SELECT *, '2' src FROM main)
) PIVOT (ANY_VALUE(STRUCT(qty, price)) main FOR src IN ('1', '2'));
The EXCEPT operator returns rows from the left input query that are not present in the right input query.
Key is how to remove duplicated rows in both tables, you can consider below approach as well using an analytic function.
SELECT * FROM (
SELECT * FROM (
SELECT *, '1' src FROM main UNION ALL SELECT *, '2' FROM main2
) QUALIFY COUNT(1) OVER (PARTITION BY id, qty, price) = 1
) PIVOT (ANY_VALUE(STRUCT(qty, price)) main FOR src IN ('1', '2'));
-- output will be same as above