sql sum - dealing with dirty duplicates in join table-CodePudding

I am pulling reports out of a bookkeeping system. The system has a nasty habit of putting duplicates in the DB which it handles internally (instead of making it clean in the first place!)

as an example - this is the totals table for invoice 125:

 ------------ ----------- ---------- 
| invoice_id | code      | amount   |
 ------------ ----------- ---------- 
|        125 | sub_total | 300.0000 |
|        125 | tax       |  30.0000 |
|        125 | total     | 330.0000 |
|        125 | sub_total | 300.0000 |
|        125 | tax       |  30.0000 |
|        125 | total     | 330.0000 |
 ------------ ----------- ----------

and the invoice table for the same id

 ----- ---------- 
| id  | amount   |
 ----- ---------- 
| 125 | 330.0000 |
 ----- ----------

I would like to generate a total sales and taxes for a period (for Australian BAS)

my MWE query (that works if the data is clean) is

select sum(a.amount) as total_sales, sum(c.amount) as total_GST
from 7cn_invoices a 
    INNER JOIN 7cn_invoice_totals c ON a.id = c.invoice_id
where c.code = 'tax';

However, since there are duplicates in the total table I get total sales double what they should have been. What is the best way to solve this (other than patching the code)?

CodePudding user response：

You can remove duplicates By using a subquery with DISTINCT

CREATE TABLE 7cn_invoice_totals  (
  `invoice_id` INTEGER,
  `code` VARCHAR(9),
  `amount` DECIMAL(10,4)
);

INSERT INTO 7cn_invoice_totals 
  (`invoice_id`, `code`, `amount`)
VALUES
  ('125', 'sub_total', '300.0000'),
  ('125', 'tax', '30.0000'),
  ('125', 'total', '330.0000'),
  ('125', 'sub_total', '300.0000'),
  ('125', 'tax', '30.0000'),
  ('125', 'total', '330.0000');

CREATE TABLE 7cn_invoices  (
  `id` INTEGER,
  `amount` INTEGER
);

INSERT INTO 7cn_invoices 
  (`id`, `amount`)
VALUES
  ('125', '330.0000');

select sum(a.amount) as total_sales, sum(c.amount) as total_GST
from  7cn_invoices a 
    INNER JOIN (SELECT DISTINCT `invoice_id`, `code`, `amount`  FROM 7cn_invoice_totals) c ON a.id = c.invoice_id
where c.code = 'tax';

total_sales | total_GST
----------: | --------:
        330 |   30.0000

db<>fiddle here