Home > Net >  Filtering duplicates on join without effecting first table
Filtering duplicates on join without effecting first table

Time:07-29

Basically, I have the following data queried from three tables

{   _Document330    } {    _Document224_VT3836      } {   _Document224         }
_IDRef | _Number     | _Document224_IDRRef | _LineNo | _IDRef         | _Marked |
0x9468 | К0000000598 | 0x5972941689C3      |  1      | 0x5972941689C3 | 0x01    |
0x9468 | К0000000598 | 0x5A474968456C      |  1      | 0x5A474968456C | 0x00    |
0x8543 | K0000000009 | NULL                | NULL    | NULL           | NULL    |

This is the only order I can join this tables

SELECT *
FROM dbo._Document330 T12
LEFT OUTER JOIN dbo._Document224_VT3836 T41
ON T41._Fld3852_RRRef = T12._IDRRef
LEFT OUTER JOIN dbo._Document224 T42
ON T42._IDRRef = T41._Document224_IDRRef

This is what I need in output

{   _Document330    } {    _Document224_VT3836      } {   _Document224         }
_IDRef | _Number     | _Document224_IDRRef | _LineNo | _IDRef         | _Marked |
0x9468 | К0000000598 | 0x5A474968456C      |  1      | 0x5A474968456C | 0x00    |
0x8543 | K0000000009 | NULL                | NULL    | NULL           | NULL    |

If I select this data like this:

SELECT * 
FROM dbo._Document330 T12
LEFT OUTER JOIN dbo._Document224_VT3836 T41
ON T41._Fld3852_RRRef = T12._IDRRef
RIGHT JOIN dbo._Document224 T42
ON T42._IDRRef = T41._Document224_IDRRef
AND T42._Marked = 0x00

Then I would not have a row with the number K0000000009

CodePudding user response:

It looks like you're doing

SELECT *
FROM a
  LEFT JOIN b 

and your problem is that for each row in a you're getting more than one output row becuase jour JOIN condition matches more than one row from b to a row in a.

The solution in this situation is to use a sub-query to make from b a new table whose key is the join condition. So you get something that looks like

SELECT *
FROM a
  LEFT JOIN (
    SELECT foo, AGG(bar),...
    FROM b
    GROUP BY foo
  ) AS b
    ON a.foo = b.foo

The trick then is to choose the aggregate functions AGG so that you pick appropriate values to represent the records from b. Sometimes simple aggregates aren't appropriate, for example you want the first (or last -- reverse the sort order) record which you can obtain with something like

SELECT *
FROM a
  LEFT JOIN (
    SELECT *,
      ROW_NUMBER() OVER (PARTITOIN BY foo ORDER BY bar) AS RowNumber
    FROM b
  ) AS b
    ON a.foo = b.foo AND b.RowNumber = 1
  • Related