I am new to making complex (let me know if this one is not complex) SQL queries and am struggling to create a query that will join 2 tables with multiple different combinations. Here is my query for just the 1st combination.
SELECT DISTINCT
A.Id, A.Ref1, A.Ref2, A.Ref3, B.Ref1, B.Ref2, B.Ref3
FROM
A
JOIN
B ON TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1))
AND TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2))
AND TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))
Ref1 - Ref3
are columns that are necessary in the 1st combination and this is classified as an "Exact Match".
However, if it so happens that only Ref1 and Ref2 matches, then it is classified as a "Partial Match" instead. Here is a list of possible combinations:
- A.Ref1 = B.Ref1 and A.Ref2 = B.Ref2 and A.Ref3 = B.Ref3 = Exact Match
- A.Ref1 = B.Ref1 and A.Ref2 = B.Ref2 = Partial Match 1
- A.Ref2 = B.Ref2 and A.Ref3 = B.Ref3 = Partial Match 1
- A.Ref1 = B.Ref1 and A.Ref3 = B.Ref3 = Partial Match 1
- A.Ref1 = B.Ref1 = Partial Match 2
- A.Ref2 = B.Ref2 = Partial Match 2
- A.Ref3 = B.Ref3 = Partial Match 2
My initial attempt at this looked to have been working just fine but eventually broke once I added values that would make the result set non-distinct between the matches.
Anyway, for the actual goal, I intended for Exact Match entries to not be included when searching Partial Match 1 entries, then for Partial 2, Partial Match 1 and Exact Match should not be included (not sure if I am making enough sense at this point).
My initial attempts keep duplicating records which is really just redundant data since if it already matched exactly, the record should not have been partially matched as well. Here is a sample of the result I got:
Id A.Ref1 A.Ref2 A.Ref3 B.Ref1 B.Ref2 B.Ref3 Match Type
1 Val1 Val1 Val1 Val1 Val1 Val1 Exact Match
1 Val1 Val1 NULL Val1 Val1 NULL Partial Match 1
1 Val1 NULL Val1 Val1 NULL Val1 Partial Match 1
1 Val1 NULL NULL Val1 NULL NULL Partial Match 2
In the scenario above, since Id 1 already is an "Exact Match", I don't want it to appear multiple times for "Partial Match" except for when there are other distinct field values for the record.
I am trying to do this all in just 1 query but I don't think there should be limitations to have them as separate ones instead. It would probably be easier that way but please do let me know if there is a possibility of it being done in just 1 go as this will be executed several times.
Any help on the matter will be much appreciated. Thank you.
CodePudding user response:
You can cross join the two tables and then count how many columns match:
select
id, a_ref1, a_ref2, a_ref3, b_ref1, b_ref2, b_ref3,
case match_count
when 3 then 'Exact Match'
when 2 then 'Partial Match 1'
when 1 then 'Partial Match 2'
else 'No Match'
end as match_type
from
(
select
a.id,
a.ref1 as a_ref1, a.ref2 as a_ref2, a.ref3 as a_ref3,
b.ref1 as b_ref1, b.ref2 as b_ref2, b.ref3 as b_ref3,
case when trim(upper(b.ref1)) = trim(upper(a.ref1)) then 1 else 0 end
case when trim(upper(b.ref2)) = trim(upper(a.ref2)) then 1 else 0 end
case when trim(upper(b.ref3)) = trim(upper(a.ref3)) then 1 else 0 end
as match_count
from a cross join b
) match_counted
-- where match_count = 3 /* only Exact Matches */
-- where match_count = 2 /* only Partial Matches Type 1 */
-- where match_count = 1 /* only Partial Matches Type 2 */
-- where match_count = 0 /* only Non-Matches */
;
You can use any of the suggested WHERE
clauses in order to only get a particular match type. Or a variation if you want to pick several, e.g. where match_count >0
or where match_count in (1,2)
.
If you want to exclude non-matches, you can inner join the tables instead on trim(upper(b.ref1)) = trim(upper(a.ref1)) or trim(upper(b.ref2)) = trim(upper(a.ref2)) or trim(upper(b.ref3)) = trim(upper(a.ref3))
. This may make the query faster (but is not guaranteed to do so). It will, however, make the query a tad more prone to errors and a lower its maintainability, because you'd state the conditions twice.
CodePudding user response:
You can use CASE WHEN ... THEN in your SELECT clause to implement your match rules.
SELECT *
FROM (
Select Distinct A.Id, A.Ref1, A.Ref2, A.Ref3, B.Ref1, B.Ref2, B.Ref3,
CASE
WHEN A.Ref1 = B.Ref1 AND A.Ref2 = B.Ref2 AND A.Ref3 = B.Ref3
THEN 'Exact Match'
WHEN A.Ref1 = B.Ref1 AND A.Ref2 = B.Ref2
THEN 'Partial Match 1'
WHEN A.Ref2 = B.Ref2 AND A.Ref3 = B.Ref3
THEN 'Partial Match 1'
WHEN A.Ref1 = B.Ref1 AND A.Ref3 = B.Ref3
THEN 'Partial Match 1'
WHEN (A.Ref1 = B.Ref1 OR A.Ref2 = B.Ref2 OR A.Ref3 = B.Ref3)
THEN 'Partial Match 2'
ELSE NULL
END Match
from A
Join B
On TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And
TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2)) And
TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))
) subquery
WHERE Match = 'Partial Match 1'
This gets the desired items from your resultset, omitting the ones you don't want.
SQL is all about sets and subqueries.
It's a bit verbose, but so are your matching rules.