Home > Software design >  Multi Join Table, Multiple Sums
Multi Join Table, Multiple Sums

Time:10-01

I've got 3 tables I need to work with:

CREATE TABLE invoices (
  id INTEGER,
  number VARCHAR(256)
)

CREATE TABLE items (
  invoice_id INTEGER,
  total DECIMAL
)

CREATE TABLE payments (
  invoice_id INTEGER,
  total DECIMAL
)

I need a result set along the lines of:

invoices.id invoices.number item_total payment_total oustanding_balance
00001 i82 42.50 42.50 00.00
00002 i83 89.99 9.99 80.00

I tried

SELECT 
  invoices.*, 
  SUM(items.total) AS item_total, 
  SUM(payments.total) AS payment_total, 
  SUM(items.total) - SUM(payments.total) AS oustanding_balance 
FROM 
  invoices 
  LEFT OUTER JOIN items ON items.invoice_id = invoices.id 
  LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id 
GROUP BY 
  invoices.id

But that fails. The sum for payments ends up wrong since I'm doing 2 joins here and I end up counting payments multiple times.

I ended up with

SELECT
  invoices.*,
  invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
  (
    SELECT invoices.*,
    (SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
    (SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
  ) AS invoices

But ... that feels ugly. Now I've got subqueries going on everywhere. It DOES work, but I'm concerned about performance?

There has to be some good way to do this with joins - I'm sure I'm missing something super obvious?

CodePudding user response:

As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.

Doing such GOUP BY on an ID and a total in sub queries won't significantly downgrade your performance (depending on your tables sizes).

Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :

SELECT
  invoices.id
  , i_total.total as item_total
  , p_total.total aspayment_total
  , ( i_total.total - p_total.total) as outstanding_balance
FROM 
  invoices 
LEFT JOIN (
  SELECT invoice_id, SUM(total) as total FROM items GROUP BY invoice_id
) i_total 
  ON i_total.invoice_id = invoices.id
LEFT JOIN (
  SELECT invoice_id, SUM(total) as total FROM payments GROUP BY invoice_id
) p_total 
  ON p_total.invoice_id = invoices.id

CodePudding user response:

I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.

CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.

with item_data as (
  SELECT invoice_id, SUM(total) as item_total
  FROM items
  group by invoice_id
),
payment_data as (
  SELECT invoice_id, SUM(total) as payment_total
  FROM payments
  group by invoice_id
)
select
  i.*,
  id.item_total - pd.payment_total as outstanding_balance
from
  invoices i
  join item_data id on i.invoice_id = id.invoice_id
  join payment_data pd on i.invoice_id = pd.invoice_id

Untested, but hopefully you get the idea.

  • Related