SQL optional match based on number of matches in other table & optional column match-CodePudding

I am trying to figure out how to solve an issue with a left join with two large tables (150k x 300k rows).

I have a key that should be unique, but sometimes isn't, in the format of 11002_13202. It is possible that this value occurs twice. There is an additional column that provides information on the uniqueness, which is provided like 'AAA'. I can add a where clause which would check for the match on the key AND the additional info (let's say 11002_13202_AAA). This will prevent wrong matches.

However, sometimes the additional field might be AAA in table B, which mismatches with the additional field in table A, and there is only one match on the key when joined.

Is it possible to optionally match on the additional field IF more than one join is done, and join anyways when there is only 1 match and the additional field is not necessarily equal between both tables?

Hoping to achieve the following result after joining:

Table A key	Table B key	Additional Field A	Additional Field B
11002_13202	11002_13202	AAA	AAA
11002_13202	NULL	BBB	NULL
92321_32322	92321_32322	BBB	AAA
2310_23123	NULL	AAA	NULL
2310_23123	NULL	BBB	NULL
2310_23123	2310_23123	CCC	CCC
23821_2322	NULL	BBB	NULL
23821_2322	23821_2322	BBB	BBB

Sample data table A

Table A key	Additional Field A
11002_13202	AAA
11002_13202	BBB
92321_32322	BBB
2310_23123	AAA
2310_23123	BBB
2310_23123	CCC
23821_2322	AAA
23821_2322	BBB

Sample data table B

Table B key	Additional Field B
11002_13202	AAA
92321_32322	AAA
2310_23123	CCC
23821_2322	BBB

CodePudding user response：

If my understanding is correct, you could use ROW_NUMBER to only return the value of table B's columns if it's the first row, and order it by if the values match, and then an arbitrary value, I use the "addition field" (they are columns not "fields") here:

SELECT *
INTO dbo.TableA
FROM (VALUES('11002_13202','AAA'),
            ('11002_13202','BBB'),
            ('92321_32322','BBB'))V([Key],Additional);
SELECT *
INTO dbo.TableB
FROM (VALUES('11002_13202','AAA'),
            ('92321_32322','AAA'))V([Key],Additional);
GO

SELECT A.[Key] AS AKey,
       CASE ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) WHEN 1 THEN B.[Key] END AS BKey,
       A.Additional AS AAdditional,
       CASE ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) WHEN 1 THEN B.Additional END AS BAdditional
FROM dbo.TableA A
     JOIN dbo.TableB B ON A.[Key] = B.[Key]; --Assumed INNER JOIN, might want to be a LEFT JOIN though?

GO

DROP TABLE dbo.TableA;
DROP TABLE dbo.TableB;

If you prefer, you could put the ROW_NUMBER expression into a CTE/derived table, so you don't need to repeat the logic:

WITH CTE AS(
    SELECT A.[Key] AS AKey,
           B.[Key] AS BKey,
           A.Additional AS AAdditional,
           B.Additional AS BAdditional,
           ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) AS RN
    FROM dbo.TableA A
         JOIN dbo.TableB B ON A.[Key] = B.[Key]) --Assumed INNER JOIN, might want to be a LEFT JOIN though?
SELECT AKey,
       CASE RN WHEN 1 THEN BKey END AS BKey,
       AAdditional,
       CASE RN WHEN 1 THEN BAdditional END AS BAdditional
FROM CTE;

As mentioned in the comments though, the data is the fundamental problem here, and time should (must) be taken to address that. Getting a "working solution" now as an interim is "fine", but having the repeat odd logic like this only make things harder over time, not easier. You'll likely end up with needing to use VIEWs and (inline table type) FUNCTIONs just to get datasets in a consumable format for your basic queries, which could end up hindering performance if the data is really bad.

Taking time to schedule fixes for your data in the new year should be something on the priority list.