Home > Software engineering >  Assign group_id to all linked records in many-to-many table
Assign group_id to all linked records in many-to-many table

Time:08-28

There are 3 tables in my DB; Table 1 has 'Collateral', Table 2 has 'Loans', Table 3 is a multi-link table between 1 and 2; let's call it 'Loan_Collateral_link'.

A collateral can belong to 1 or more Loans, and a Loan can have multiple Collaterals.

What I want to achieve, is create a separate result set (or table) in which I group together all Loans and Collaterals which are in any way linked to eachother through different Loans and/or Collaterals.

Loans:

ID name
Loan1 ABC
Loan2 DEF
Loan3 GHI
Loan4 JKL
Loan5 MNO

Collaterals:

ID name
Coll1 Col1
Coll2 Col2
Coll3 Col3

Loan_Collateral_link:

Loan_ID Collateral_ID
Loan1 Col1
Loan2 Col1
Loan2 Col3
Loan3 Col2
Loan4 Col2
Loan5 Col1
Loan5 Col3

So you see Loan1, Loan2 and Loan5 are sharing col1, so I want them to be grouped. Col3 should be in that same group, as it is linked through Loan2.

Loan4 and Loan3 are linked through Col2, so they should be a separate group.

The resultset would be something as per below:

Groups:

Group_ID item_ID
Group1 Loan1
Group1 Loan2
Group1 Loan5
Group1 Col1
Group1 Col3
Group2 Loan3
Group2 Loan4
Group2 Col2

I'm trying to write a script, probably using a loop? to get this done. I was thinking to loop over each record in the Loan_Collateral_Link table, write every link that I find for the record into a temp_table. Then loop over this temp_table to find all linked records. However, I can't really seem to work it out conceptually, as the loop should somehow reference itself.

Something like;

--ForEach Loan_Collateral_Link Loan;
    --If not exists - Insert LOAN_ID into TempTable
    --If not exist - Insert COL_ID into TempTable
        --ForEach TempTable COL_ID;
            --Select * FROM Loan_Collateral_Link where Loan_ID or Col_ID matches
                --> If not exists - Insert LOAN_ID / COL_ID into TempTable

But this seems a bit like an infite loop perhaps?

Perhaps I should order all Loans, than go over Loan per Loan, add all the Col's, and in the loop per loan select all other loans with this loan. Than go over all the new Cols and select all matching Loans. However, than I have new Loans, so I need to go back and fetch their possible Cols. So again, how many times do I loop? Use a flag per collateral and keep looping until all flags are fulfilled?

CodePudding user response:

You can achieve that you want using UNION ALL operator like this:

First, I would like to know if loan_id appeared for the first time or not. I use the ROW_NUMBER function in the lcl_rn CTE for this.

Second, I'd like to get the distinct collateral_id values and assign them group ids. I do it in the groups CTE.

I then get the loan_id values ​​that occur for the first time (they have rn equal to 1).

Then I get those groups (collateral_id values) whose loan_id values ​​appeared for the first time. I use EXISTS operator to check if the row with particular collateral_id and loan_id values has rn equal to 1.

And then I get those collateral_id values whose loan_id values appeared not for the first time (with rn greater than 1) and join them to the group where they appeared for the first time.

Finally, I combine all these queries with the UNION ALL operator.

Query

WITH lcl_rn AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY loan_id ORDER BY collateral_id, loan_id) AS rn
    FROM loan_collateral_link 
),
groups AS (
    SELECT 
        *,
        'Group'   CAST(ROW_NUMBER() OVER (ORDER BY collateral_id) AS VARCHAR(MAX)) AS id
    FROM (SELECT DISTINCT collateral_id FROM loan_collateral_link) c
)
SELECT 
    g.id AS group_id,
    l.loan_id AS item_id
FROM lcl_rn l
JOIN groups g ON l.collateral_id = g.collateral_id
WHERE rn = 1
UNION ALL
SELECT DISTINCT
    id,
    collateral_id
FROM groups 
WHERE EXISTS (
    SELECT 1 
    FROM lcl_rn 
    WHERE lcl_rn.collateral_id = groups.collateral_id AND lcl_rn.rn = 1
)
UNION ALL
SELECT 
    g.id,
    q.collateral_id
FROM (    
    SELECT 
        DISTINCT l2.collateral_id AS group_collateral_id,
        l1.collateral_id AS collateral_id
    FROM lcl_rn l1
    JOIN lcl_rn l2 ON l1.loan_id = l2.loan_id AND l2.rn = 1
    WHERE l1.rn > 1
) q
JOIN groups g ON g.collateral_id = q.group_collateral_id
ORDER BY id, item_id

Output

group_id item_id
Group1 Col1
Group1 Col3
Group1 Loan1
Group1 Loan2
Group1 Loan5
Group2 Col2
Group2 Loan3
Group2 Loan4

db<>fiddle here

CodePudding user response:

A while ago I wrote an answer to the question How to find all connected subgraphs of an undirected graph. Your question here looks like the same problem.

Essentially, it treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.

For the detailed explanation of the query see that answer. I adapted that query to your schema here.

Also, this problem can be solved in linear time. Wikipedia has description of the algorithm. But, the T-SQL method shown here is far from optimal. If you still want to do it in T-SQL you can try the following. Instead of trying to find all groups at once, pick a single starting Identifier (add WHERE Ident='Loan1' to CTE_Idents and CTE_Pairs), then delete from the big table all identifiers that were appended to this group. Repeat until the big table is empty. At least you could see the progress. Proper indexes on Ident columns would not hurt as well.

Sample data

DECLARE @T TABLE (LoanID varchar(50), CollateralID varchar(50));
INSERT INTO @T (LoanID, CollateralID) VALUES
('Loan1', 'Col1'),
('Loan2', 'Col1'),
('Loan2', 'Col3'),
('Loan3', 'Col2'),
('Loan4', 'Col2'),
('Loan5', 'Col1'),
('Loan5', 'Col3'),

-- plus an extra group with some more interesting links
('Loan80', 'Col80'),
('Loan81', 'Col80'),
('Loan80', 'Col81'),
('Loan81', 'Col81'),
('Loan82', 'Col81')
;

Query

WITH
CTE_Idents
AS
(
    SELECT LoanID AS Ident
    FROM @T

    UNION

    SELECT CollateralID AS Ident
    FROM @T
)
,CTE_Pairs
AS
(
    SELECT LoanID AS Ident1, CollateralID AS Ident2
    FROM @T

    UNION

    SELECT CollateralID AS Ident1, LoanID AS Ident2
    FROM @T
)
,CTE_Recursive
AS
(
    SELECT
        CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent 
        , Ident1
        , Ident2
        , CAST(','   Ident1   ','   Ident2   ',' AS varchar(8000)) AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CAST(CTE_Recursive.IdentPath   CTE_Pairs.Ident2   ',' AS varchar(8000)) AS IdentPath
        , CTE_Recursive.Lvl   1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE CAST('%,'   CTE_Pairs.Ident2   ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
SELECT
    CTE_Idents.Ident
    ,CASE WHEN CA_Data.XML_Value IS NULL 
    THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
    ,DENSE_RANK() OVER(ORDER BY 
        CASE WHEN CA_Data.XML_Value IS NULL 
        THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
    ) AS GroupID
FROM
    CTE_Idents
    CROSS APPLY
    (
        SELECT CTE_CleanResult.Ident ','
        FROM CTE_CleanResult
        WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
        ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
    ) AS CA_XML(XML_Value)
    CROSS APPLY
    (
        SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
    ) AS CA_Data(XML_Value)
WHERE
    CTE_Idents.Ident IS NOT NULL
ORDER BY GroupID, Ident;

Result

 -------- ----------------------------------- --------- 
| Ident  |           GroupMembers            | GroupID |
 -------- ----------------------------------- --------- 
| Col1   | Col1,Col3,Loan1,Loan2,Loan5,      |       1 |
| Col3   | Col1,Col3,Loan1,Loan2,Loan5,      |       1 |
| Loan1  | Col1,Col3,Loan1,Loan2,Loan5,      |       1 |
| Loan2  | Col1,Col3,Loan1,Loan2,Loan5,      |       1 |
| Loan5  | Col1,Col3,Loan1,Loan2,Loan5,      |       1 |
| Col2   | Col2,Loan3,Loan4,                 |       2 |
| Loan3  | Col2,Loan3,Loan4,                 |       2 |
| Loan4  | Col2,Loan3,Loan4,                 |       2 |
| Col80  | Col80,Col81,Loan80,Loan81,Loan82, |       3 |
| Col81  | Col80,Col81,Loan80,Loan81,Loan82, |       3 |
| Loan80 | Col80,Col81,Loan80,Loan81,Loan82, |       3 |
| Loan81 | Col80,Col81,Loan80,Loan81,Loan82, |       3 |
| Loan82 | Col80,Col81,Loan80,Loan81,Loan82, |       3 |
 -------- ----------------------------------- --------- 
  • Related