Home > Back-end >  How to avoid duplicates when joining tables in SQL
How to avoid duplicates when joining tables in SQL

Time:03-03

I'd like to run the following analysis in SQL Server, but since my SQL knowledge is rather limited, I can't get to where I'm trying to.

I have the following tables that show how Loans connect to Borrowers, and how Loans connect to Real Estate property collateral pledged by the Borrowers. Some of the Loans are connected to 2 Borrowers. I want to see how much money all Borrowers that connect to an RE Property owe, without double-counting for loans where both borrowers are obligors. For example, assume the following tables:

1. Loans

Loan_ID Total_Claim
Loan_1 50,000
Loan_2 30,000
Loan_3 40,000

2. Borrowers_Loans

Borrower_ID Loan_ID
Borrower_1 Loan_1
Borrower_1 Loan_2
Borrower_2 Loan_2
Borrower_2 Loan_3

3. RE_Properties

Property_ID Loan_ID
Property_1 Loan_1
Property_2 Loan_2
Property_3 Loan_3

Now, I would like to calculate how much in total Borrowers that are connected to an RE property owe, as below:

Property_ID Total_Owed_By_Borrowers
Property_1 50,000
Property_2 120,000
Property_3 70,000

However, running the below double-counts Loan_2 (and shows the sum of Property_2 as 150,000), as it is connected to both Borrowers:

SELECT RE.Property_ID, SUM(L.Total_Claim) FROM RE_Properties RE
  INNER JOIN Borrowers_Loans BL ON BL.Loan_ID = RE.Loan_ID
  INNER JOIN Borrowers_Loans BLA ON BLA.Borrower_ID = BL.Borrower_ID
  INNER JOIN Loans L ON L.Loan_ID = BLA.Loan_ID
GROUP BY RE.Property_ID

Any suggestions? Thanks!


SQL Fiddle

CREATE TABLE RE_Properties (
  Property_ID nvarchar(255),
  Loan_ID nvarchar(255));
  
CREATE TABLE Borrowers_Loans (
  Borrower_ID nvarchar(255),
  Loan_ID nvarchar(255));
  
CREATE TABLE Loans (
  Loan_ID nvarchar(255),
  Total_Claim int);
  
INSERT INTO RE_Properties (Property_ID, Loan_ID) VALUES ('Property_1', 'Loan_1');
INSERT INTO RE_Properties (Property_ID, Loan_ID) VALUES ('Property_2', 'Loan_2');
INSERT INTO RE_Properties (Property_ID, Loan_ID) VALUES ('Property_3', 'Loan_3');

INSERT INTO Borrowers_Loans (Borrower_ID, Loan_ID) VALUES ('Borrower_1', 'Loan_1');
INSERT INTO Borrowers_Loans (Borrower_ID, Loan_ID) VALUES ('Borrower_1', 'Loan_2');
INSERT INTO Borrowers_Loans (Borrower_ID, Loan_ID) VALUES ('Borrower_2', 'Loan_2');
INSERT INTO Borrowers_Loans (Borrower_ID, Loan_ID) VALUES ('Borrower_2', 'Loan_3');

INSERT INTO Loans (Loan_ID, Total_Claim) VALUES ('Loan_1', 50000);
INSERT INTO Loans (Loan_ID, Total_Claim) VALUES ('Loan_2', 30000);
INSERT INTO Loans (Loan_ID, Total_Claim) VALUES ('Loan_3', 40000);

CodePudding user response:

Risk to each borrower is sum of all loans associated with them (assuming they adopt the whole loan if any associated borrower defaults)

select bl.Borrower_ID, sum(loans.total_claim) borrower_total_risk
from Borrowers_Loans as bl
inner join Loans on bl.loan_id = loans.loan_id

Risk to each property is the value of the loans to that property

select p.Property_ID, sum(loans.total_claim) property_loan_total
from RE_Properties as p
inner join Loans on p.Loan_ID = loans.Loan_ID
group by p.Property_ID

If I was to combine this I wouldn't just do a simple sum, e.g.:

SELECT
      RE.Property_ID
    , max(L.Total_Claim) AS Total_Claim
    , count(distinct bl.Borrower_ID) num_borrowers 
    , min(b_risk.borrower_total_risk) as min_borrower_total_risk
    , max(b_risk.borrower_total_risk) as max_borrower_total_risk
FROM RE_Properties RE
  INNER JOIN Loans L ON L.Loan_ID = RE.Loan_ID
  LEFT JOIN Borrowers_Loans BL ON BL.Loan_ID = RE.Loan_ID
  LEFT JOIN (
              select bl.Borrower_ID, sum(loans.total_claim) borrower_total_risk
              from Borrowers_Loans as bl
              left join Loans on bl.loan_id = loans.loan_id
              group by bl.Borrower_ID
              ) as b_risk on bl.Borrower_ID = b_risk.Borrower_ID
GROUP BY RE.Property_ID;

Results:

| Property_ID | Total_Claim | num_borrowers | min_borrower_total_risk | max_borrower_total_risk |
|-------------|-------------|---------------|-------------------------|-------------------------|
|  Property_1 |       50000 |             1 |                   80000 |                   80000 |
|  Property_2 |       30000 |             2 |                   70000 |                   80000 |
|  Property_3 |       40000 |             1 |                   70000 |                   70000 |

Note that the count of borrowers shows 2 when it needs to and that does NOT require multiple joins to the same table.

http://sqlfiddle.com/#!9/9280d4/26/1

CodePudding user response:

There seems to be no need to involve Borrowers. Join only Properties and Loans

RE_Properties           Loans   
Property_ID Loan_ID     Loan_ID Total_Claim
Property_1  Loan_1      Loan_1  50000
Property_2  Loan_2      Loan_2  30000
Property_3  Loan_3      Loan_3  40000
                        Loan_4  10000
                        Loan_5  70000

CodePudding user response:

There may be an easier way, but using your query as a starting point you can ensure each loan is only counted once by grouping by Loan_ID and taking MAX for the amount (or FIRST it doesn't matter since we know its the same value) as follows:

WITH Cte as (
    SELECT RE.Property_ID
        , L.Loan_ID, MAX(L.Total_Claim) Total_Claim
    FROM RE_Properties RE
    INNER JOIN Borrowers_Loans BL ON BL.Loan_ID = RE.Loan_ID
    INNER JOIN Borrowers_Loans BLA ON BLA.Borrower_ID = BL.Borrower_ID
    INNER JOIN Loans L ON L.Loan_ID = BLA.Loan_ID
    GROUP BY RE.Property_ID, L.Loan_ID
)
SELECT Property_ID, SUM(Total_Claim)
FROM Cte
GROUP BY Property_ID;
  • Related