Home > Blockchain >  Select records without duplicate with specific logic
Select records without duplicate with specific logic

Time:10-08

I have a tables A, B and C with a lot of columns (30 ). Main columns for all are Id, RefNumber

Also I have table LinkedEntity where I can match records from different tables (A, B or C)

I need to select all records from table A and also display linked records from B and C

A

Id RefNumber OtherColumns
101 A101 ...
102 A102 ...

B

Id RefNumber OtherColumns
201 B101 ...
202 B102 ...

C

Id RefNumber OtherColumns
301 C101 ...
302 C102 ...

LinkedEntity

Id EntityId LinkedEntityId
1 101 202
2 102 301
3 102 201
4 102 202

Expected result:

Id RefNumber LinkedB LinkedBRefNumb LinkedC LinkedCRefNumb
101 A101 202 B102 NULL NULL
102 A102 201,202 B101,B102 301 C101

First idea to write something like

SELECT A.Id, A.RefNumber, L1.Id, L1.RefNumber, L2.Id, L2.RefNumber
FROM A
LEFT JOIN (SELECT B.Id, B.RefNumber, le.EntityId, le.LinkedEntityId FROM B JOIN LinkedEntity le ON le.EntityId = B.Id OR le.LinkedEntityId = B.Id) L1
ON A.Id = L1.EntityId OR A.Id = L1.LinkedEntityId 
LEFT JOIN (SELECT C.Id, C.RefNumber, le.EntityId, le.LinkedEntityId FROM C JOIN LinkedEntity le ON le.EntityId = C.Id OR le.LinkedEntityId = C.Id) L2
ON A.Id = L2.EntityId OR A.Id = L2.LinkedEntityId

But this query returns duplicates records of A table. Is there any way to remove duplicates and have joined values of linkedEntities? (Maybe using STRING_AGG) ?

CodePudding user response:

As Larnu suggested, you need multiple STRING_AGG expressions, and I would use OUTER APPLY instead of trying to work them into your existing join.

SELECT A.Id, A.RefNumber,
  B.LinkedB, B.LinkedBRefNumb,
  C.LinkedC, C.LinkedCRefNumb
FROM dbo.A
OUTER APPLY 
(
  SELECT
    LinkedB        = STRING_AGG(B.Id, ','), 
    LinkedBRefNumb = STRING_AGG(B.RefNumber, ',')
  FROM dbo.LinkedEntity AS LB
  INNER JOIN dbo.B ON B.Id IN (LB.EntityId, LB.LinkedEntityId)
  WHERE A.Id IN (LB.EntityId, LB.LinkedEntityId)
) AS B
OUTER APPLY 
(
  SELECT
    LinkedC        = STRING_AGG(C.Id, ','), 
    LinkedCRefNumb = STRING_AGG(C.RefNumber, ',')
  FROM dbo.LinkedEntity AS LC
  INNER JOIN dbo.C ON C.Id IN (LC.EntityId, LC.LinkedEntityId)
  WHERE A.Id IN (LC.EntityId, LC.LinkedEntityId)
) AS C;

Results:

Id RefNumber LinkedB LinkedBRefNumb LinkedC LinkedCRefNumb
101 A101 202 B102 null null
102 A102 201,202 B101,B102 301 C101

If you want to guarantee the strings are aggregated in the same order, you can add WITHIN GROUP (ORDER BY <something>.Id)

Note that this solution assumes something that Charlie brought up: that A, B, and C all contain ranges of Id values that are different from each other. This is true for your sample data, but this seems very dumbed down so I'd be very cautious against your real data.

CodePudding user response:

Here's a simple approach using joins that I tested with some additional duplicate entries in LinkedEntity (your current design allows this, you may fix this with a composite key of EntityId and LinkedEntityId and removing the Id from this table).

You do need to group your records using STRING_AGG or one of these other approaches for older versions of sql server. Moreover, if you are interested in ordering the grouped concatenated data, the docs provide additional examples how to order data.

This uses the IN operator and left joins to retrieve linked entities and the distinct keyword to remove duplicates.

Query 1

SELECT
    AId as Id,
    ARefNumber as RefNumber,
    STRING_AGG(BId,',') as LinkedB,
    STRING_AGG(BRefNumber,',') as LinkedBRefNumb,
    STRING_AGG(CId,',') as LinkedC,
    STRING_AGG(CRefNumber,',') as LinkedCRefNumb
FROM (
    SELECT DISTINCT
        A.Id as AId,
        A.RefNumber as ARefNumber,
        B.Id as BId,
        B.RefNumber as BRefNumber,
        C.Id as CId,
        C.RefNumber as CRefNumber
    FROM 
        A
    LEFT JOIN
        LinkedEntity le ON A.Id IN (le.EntityId,le.LinkedEntityId)
    LEFT JOIN
        B ON B.Id IN  (le.EntityId,le.LinkedEntityId) 
    LEFT JOIN
        C ON C.Id IN  (le.EntityId,le.LinkedEntityId) 
) t
GROUP BY
    AId,
    ARefNumber
Id RefNumber LinkedB LinkedBRefNumb LinkedC LinkedCRefNumb
101 A101 202 B102 null null
102 A102 201,202 B101,B102 301 C101

View working demo db fiddle

Let me know if this works for you.

CodePudding user response:

Have you tried foreign key (for what i understood, i'm not sure the table are linked by id or refnumber):

ALTER TABLE A
   ADD CONSTRAINT Foreign1 FOREIGN KEY (Id)
      REFERENCES B (Id)
      ON DELETE RESTRICT
      ON DELETE RESTRICT

an so on for b to c and from Linkedentity to a, b, c. Here the ms official page https://docs.microsoft.com/it-it/sql/relational-databases/tables/create-foreign-key-relationships?view=sql-server-ver15

Then all you have to do should be:

SELECT * FROM A JOIN B JOIN C JOIN LinkedEntity

CodePudding user response:

SELECT
a_id,
a_RefNumber,
GROUP_AGGR(b_id) AS b_aggr_id,
GROUP_AGGR(b_RefNumber) AS b_aggr_ref,
GROUP_AGGR(c_id) as c_aggr,
GROUP_AGGR(c_RefNumber) AS c_aggr_ref
FROM (

    SELECT
    A.id AS a_id,
    A.RefNumber AS a_RefNumber,
    B.id AS b_id,
    B.RefNumber AS b_RefNumber,
    NULL AS c_id,
    NULL AS c_RefNumber
    FROM A
    LEFT JOIN LinkedEntity X ON X.EntityId=A.id
    LEFT JOIN B ON X.LinkedEntityId=B.id

    UNION ALL

    SELECT
    A.id AS a_id,
    A.RefNumber AS a_RefNumber,
    NULL AS b_id,
    NULL AS b_RefNumber,
    C.id AS c_id,
    C.RefNumber AS c_RefNumber
    FROM A
    LEFT JOIN LinkedEntity X ON X.EntityId=A.id
    LEFT JOIN C ON X.LinkedEntityId=C.id

) AS matrix
GROUP BY a_id, a_RefNumber
ORDER BY 1
  • Related