Home > Software design >  sql sum - dealing with dirty duplicates in join table
sql sum - dealing with dirty duplicates in join table

Time:02-26

I am pulling reports out of a bookkeeping system. The system has a nasty habit of putting duplicates in the DB which it handles internally (instead of making it clean in the first place!)

as an example - this is the totals table for invoice 125:

 ------------ ----------- ---------- 
| invoice_id | code      | amount   |
 ------------ ----------- ---------- 
|        125 | sub_total | 300.0000 |
|        125 | tax       |  30.0000 |
|        125 | total     | 330.0000 |
|        125 | sub_total | 300.0000 |
|        125 | tax       |  30.0000 |
|        125 | total     | 330.0000 |
 ------------ ----------- ---------- 

and the invoice table for the same id

 ----- ---------- 
| id  | amount   |
 ----- ---------- 
| 125 | 330.0000 |
 ----- ---------- 

I would like to generate a total sales and taxes for a period (for Australian BAS)

my MWE query (that works if the data is clean) is

select sum(a.amount) as total_sales, sum(c.amount) as total_GST
from 7cn_invoices a 
    INNER JOIN 7cn_invoice_totals c ON a.id = c.invoice_id
where c.code = 'tax';

However, since there are duplicates in the total table I get total sales double what they should have been. What is the best way to solve this (other than patching the code)?

CodePudding user response:

You can remove duplicates By using a subquery with DISTINCT

CREATE TABLE 7cn_invoice_totals  (
  `invoice_id` INTEGER,
  `code` VARCHAR(9),
  `amount` DECIMAL(10,4)
);

INSERT INTO 7cn_invoice_totals 
  (`invoice_id`, `code`, `amount`)
VALUES
  ('125', 'sub_total', '300.0000'),
  ('125', 'tax', '30.0000'),
  ('125', 'total', '330.0000'),
  ('125', 'sub_total', '300.0000'),
  ('125', 'tax', '30.0000'),
  ('125', 'total', '330.0000');
CREATE TABLE 7cn_invoices  (
  `id` INTEGER,
  `amount` INTEGER
);

INSERT INTO 7cn_invoices 
  (`id`, `amount`)
VALUES
  ('125', '330.0000');
select sum(a.amount) as total_sales, sum(c.amount) as total_GST
from  7cn_invoices a 
    INNER JOIN (SELECT DISTINCT `invoice_id`, `code`, `amount`  FROM 7cn_invoice_totals) c ON a.id = c.invoice_id
where c.code = 'tax';
total_sales | total_GST
----------: | --------:
        330 |   30.0000

db<>fiddle here

  • Related