I am pulling reports out of a bookkeeping system. The system has a nasty habit of putting duplicates in the DB which it handles internally (instead of making it clean in the first place!)
as an example - this is the totals table for invoice 125:
------------ ----------- ----------
| invoice_id | code | amount |
------------ ----------- ----------
| 125 | sub_total | 300.0000 |
| 125 | tax | 30.0000 |
| 125 | total | 330.0000 |
| 125 | sub_total | 300.0000 |
| 125 | tax | 30.0000 |
| 125 | total | 330.0000 |
------------ ----------- ----------
and the invoice table for the same id
----- ----------
| id | amount |
----- ----------
| 125 | 330.0000 |
----- ----------
I would like to generate a total sales and taxes for a period (for Australian BAS)
my MWE query (that works if the data is clean) is
select sum(a.amount) as total_sales, sum(c.amount) as total_GST
from 7cn_invoices a
INNER JOIN 7cn_invoice_totals c ON a.id = c.invoice_id
where c.code = 'tax';
However, since there are duplicates in the total table I get total sales double what they should have been. What is the best way to solve this (other than patching the code)?
CodePudding user response:
You can remove duplicates By using a subquery with DISTINCT
CREATE TABLE 7cn_invoice_totals ( `invoice_id` INTEGER, `code` VARCHAR(9), `amount` DECIMAL(10,4) ); INSERT INTO 7cn_invoice_totals (`invoice_id`, `code`, `amount`) VALUES ('125', 'sub_total', '300.0000'), ('125', 'tax', '30.0000'), ('125', 'total', '330.0000'), ('125', 'sub_total', '300.0000'), ('125', 'tax', '30.0000'), ('125', 'total', '330.0000');
CREATE TABLE 7cn_invoices ( `id` INTEGER, `amount` INTEGER ); INSERT INTO 7cn_invoices (`id`, `amount`) VALUES ('125', '330.0000');
select sum(a.amount) as total_sales, sum(c.amount) as total_GST from 7cn_invoices a INNER JOIN (SELECT DISTINCT `invoice_id`, `code`, `amount` FROM 7cn_invoice_totals) c ON a.id = c.invoice_id where c.code = 'tax';
total_sales | total_GST ----------: | --------: 330 | 30.0000
db<>fiddle here