Home > Mobile >  SQL optional match based on number of matches in other table & optional column match
SQL optional match based on number of matches in other table & optional column match

Time:12-24

I am trying to figure out how to solve an issue with a left join with two large tables (150k x 300k rows).

I have a key that should be unique, but sometimes isn't, in the format of 11002_13202. It is possible that this value occurs twice. There is an additional column that provides information on the uniqueness, which is provided like 'AAA'. I can add a where clause which would check for the match on the key AND the additional info (let's say 11002_13202_AAA). This will prevent wrong matches.

However, sometimes the additional field might be AAA in table B, which mismatches with the additional field in table A, and there is only one match on the key when joined.

Is it possible to optionally match on the additional field IF more than one join is done, and join anyways when there is only 1 match and the additional field is not necessarily equal between both tables?

Hoping to achieve the following result after joining:

Table A key Table B key Additional Field A Additional Field B
11002_13202 11002_13202 AAA AAA
11002_13202 NULL BBB NULL
92321_32322 92321_32322 BBB AAA
2310_23123 NULL AAA NULL
2310_23123 NULL BBB NULL
2310_23123 2310_23123 CCC CCC
23821_2322 NULL BBB NULL
23821_2322 23821_2322 BBB BBB

Sample data table A

Table A key Additional Field A
11002_13202 AAA
11002_13202 BBB
92321_32322 BBB
2310_23123 AAA
2310_23123 BBB
2310_23123 CCC
23821_2322 AAA
23821_2322 BBB

Sample data table B

Table B key Additional Field B
11002_13202 AAA
92321_32322 AAA
2310_23123 CCC
23821_2322 BBB

CodePudding user response:

If my understanding is correct, you could use ROW_NUMBER to only return the value of table B's columns if it's the first row, and order it by if the values match, and then an arbitrary value, I use the "addition field" (they are columns not "fields") here:

SELECT *
INTO dbo.TableA
FROM (VALUES('11002_13202','AAA'),
            ('11002_13202','BBB'),
            ('92321_32322','BBB'))V([Key],Additional);
SELECT *
INTO dbo.TableB
FROM (VALUES('11002_13202','AAA'),
            ('92321_32322','AAA'))V([Key],Additional);
GO

SELECT A.[Key] AS AKey,
       CASE ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) WHEN 1 THEN B.[Key] END AS BKey,
       A.Additional AS AAdditional,
       CASE ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) WHEN 1 THEN B.Additional END AS BAdditional
FROM dbo.TableA A
     JOIN dbo.TableB B ON A.[Key] = B.[Key]; --Assumed INNER JOIN, might want to be a LEFT JOIN though?

GO

DROP TABLE dbo.TableA;
DROP TABLE dbo.TableB;

If you prefer, you could put the ROW_NUMBER expression into a CTE/derived table, so you don't need to repeat the logic:

WITH CTE AS(
    SELECT A.[Key] AS AKey,
           B.[Key] AS BKey,
           A.Additional AS AAdditional,
           B.Additional AS BAdditional,
           ROW_NUMBER() OVER (PARTITION BY A.[Key] ORDER BY CASE A.[Key] WHEN B.[Key] THEN 0 ELSE 1 END, A.Additional) AS RN
    FROM dbo.TableA A
         JOIN dbo.TableB B ON A.[Key] = B.[Key]) --Assumed INNER JOIN, might want to be a LEFT JOIN though?
SELECT AKey,
       CASE RN WHEN 1 THEN BKey END AS BKey,
       AAdditional,
       CASE RN WHEN 1 THEN BAdditional END AS BAdditional
FROM CTE;

As mentioned in the comments though, the data is the fundamental problem here, and time should (must) be taken to address that. Getting a "working solution" now as an interim is "fine", but having the repeat odd logic like this only make things harder over time, not easier. You'll likely end up with needing to use VIEWs and (inline table type) FUNCTIONs just to get datasets in a consumable format for your basic queries, which could end up hindering performance if the data is really bad.

Taking time to schedule fixes for your data in the new year should be something on the priority list.

  • Related