Home > Back-end >  How to deal with SQL Aggregate Functions when Grouping deeply hiearched data
How to deal with SQL Aggregate Functions when Grouping deeply hiearched data

Time:11-17

Given the following scenario:

I have a Database with 5 tables:

  1. currency (iso_number, iso_code),
  2. product (id, name, current_price),
  3. sale (id, time_of_sale, currency_items_sold_in),
  4. sale_lines (id, sale_id, product_id, price_paid, quantity),
  5. cash_transactions (id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)

The setup allows to store what kind of currency was originally given by the customer, and what currency it was exchanged to internally and the original amount and the exchanged(converted) amount.

I want to be able to find all sales that match certain criteria (time period, seller, store) etc. ((left out for simplicity)).

For all those sales i will join related data which is sale_lines and cash_transactions. Now the currency on sale_lines always matches the currency on the related sale. However for cash_transactions the received_amount/received_currency can differ from the currency on the sale. Allthough converted_currency/converted_amount is stored on the cash_transaction line it should follow the sale.

The problem arises when i try to perform SUM of certain fields, when you start joining One-To-Many relationships and then perform aggregate functions like SUM even though you specify the correct GROUP BY's behind the scene the SQL Server still SUMs the duplicated lines that would have been required to display the data had we not used a GROUP BY.

The problem is also described here: https://wikido.isoftdata.com/index.php/The_GROUPing_pitfall

Following the solution from the article above i should in my case LEFT JOIN the aggregate results of each sale onto the outer query.

But what can i do when the sale_lines currency match the sale, but cash_transactions currency can differ from the sale.?

I've tried creating the following SQL Fiddle which inserts some test data and highlights the problems: http://sqlfiddle.com/#!17/54a7b/15

In the fiddle i have created 2 Sales, where the items are sold in DKK(208) and 752(SEK). On the first sale there are 2 sale lines, and 2 cash transactions the first transaction is directly DKK => DKK, and the second transaction is SEK => DKK.

On the second sale there are also 2 sale lines, and 2 cash transactions the first transaction is NOK => DKK, and the second transaction is directly DKK => DKK.

In the last query of the fiddle it can be observed that the total_received_amount is bogus as it is a mix of DKK,SEK and NOK does not provide much value.

I want suggestions on how to fetch the data properly, i do not care if i have to perform additional "logic" on the server-side (PHP) in order to de-duplicate some of the data as long as the sums are correct.

Any suggestions are much appreciated.

DDL FROM FIDDLE

CREATE TABLE currency (
  iso_number CHARACTER VARYING(3) PRIMARY KEY,
  iso_code CHARACTER VARYING(3)
);

INSERT INTO currency(iso_number, iso_code) VALUES ('208','DKK'), ('752','SEK'), ('572','NOK');

CREATE TABLE product (
  id SERIAL PRIMARY KEY,
  name CHARACTER VARYING(12),
  current_price INTEGER
);

INSERT INTO product(id,name,current_price) VALUES (1,'icecream',200), (2,'sunglasses',300);

CREATE TABLE sale (
  id SERIAL PRIMARY KEY,
  time_of_sale TIMESTAMP,
  currency_items_sold_in CHARACTER VARYING(3)
);

INSERT INTO sale(id, time_of_sale, currency_items_sold_in) 
VALUES 
(1, CURRENT_TIMESTAMP, '208'),
(2, CURRENT_TIMESTAMP, '752')
;

CREATE TABLE sale_lines (
  id SERIAL PRIMARY KEY,
  sale_id INTEGER,
  product_id INTEGER,
  price_paid INTEGER,
  quantity FLOAT
);

INSERT INTO sale_lines(id, sale_id, product_id, price_paid, quantity)
VALUES 
(1, 1, 1, 200, 1.0),
(2, 1, 2, 300, 1.0),

(3, 2, 1, 100, 1.0),
(4, 2, 1, 100, 1.0)
;
        


CREATE TABLE cash_transactions (
  id SERIAL PRIMARY KEY,
  sale_id INTEGER,
  received_currency_id CHARACTER VARYING(3),
  converted_currency_id CHARACTER VARYING(3),
  received_amount INTEGER,
  converted_amount INTEGER
);


INSERT INTO cash_transactions(id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)
VALUES
(1, 1, '208', '208', 200, 200),
(2, 1, '752', '208', 400, 300),

(3, 2, '572', '208', 150, 100),
(4, 2, '208', '208', 100, 100)
;

QUERIES FROM FIDDLE

--SELECT * FROM currency;
--SELECT * FROM product;
--SELECT * FROM sale;
--SELECT * FROM sale_lines;
--SELECT * FROM cash_transactions;


--- Showing the sales with duplicated lines to 
--- fit joined data for OneToMany SaleLines, and OneToMany cash transactions.
SELECT *
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id;



--- Grouping the data by important identifier "currency_items_sold_in".
--- The SUM of sl.price_paid is wrong as it SUMS the duplicated lines as well.
SELECT 
  s.currency_items_sold_in, 
  SUM(sl.price_paid) as "price_paid"
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id
GROUP BY s.currency_items_sold_in;

--- To solve this the SUM can be joined via the "Monkey-Poop" method.
--- Here the problem arises, the SUMS for cash_transaction.received_amount and cash_transaction.converted_amount cannot be relied upon
--- As those fields themselves rely on cash_transaction.received_currency_id and cash_transaction.converted_currency_id
SELECT 
  s.currency_items_sold_in, 
  SUM(sale_line_aggregates.price_paid) as "total_price_paid",
  SUM(cash_transaction_aggregates.converted_amount) as "total_converted_amount",
  SUM(cash_transaction_aggregates.received_amount) as "total_received_amount"
FROM sale s
LEFT JOIN (
  SELECT 
    sale_id,
    SUM(price_paid) AS price_paid
  FROM sale_lines
  GROUP BY sale_id
) AS sale_line_aggregates ON sale_line_aggregates.sale_id = s.id

LEFT JOIN (
  SELECT
    sale_id,
    SUM(converted_amount) as converted_amount,
    SUM(received_amount) as received_amount
  FROM cash_transactions
  GROUP BY sale_id
) AS cash_transaction_aggregates ON cash_transaction_aggregates.sale_id = s.id
GROUP BY s.currency_items_sold_in;

CodePudding user response:

You could calculate for every amount grouped per currency in the subqueries. Then join them on the currency.

And with a CTE you can make sure each subquery is using the same sales.

WITH CTE_SALE AS (
  SELECT
   id as sale_id, 
   currency_items_sold_in AS iso_number
  FROM sale
)
SELECT curr.iso_code AS currency
, COALESCE(line.price_paid, 0)  as total_price_paid
, COALESCE(received.amount, 0)  as total_received_amount
, COALESCE(converted.amount, 0) as total_converted_amount
FROM currency AS curr
LEFT JOIN (
  SELECT s.iso_number
  , SUM(sl.price_paid) AS price_paid
  FROM sale_lines sl
  JOIN CTE_SALE s ON s.sale_id = sl.sale_id
  GROUP BY s.iso_number
) AS line 
  ON line.iso_number = curr.iso_number
LEFT JOIN (
  SELECT tr.received_currency_id as iso_number
  , SUM(tr.received_amount) AS amount
  FROM cash_transactions tr
  JOIN CTE_SALE s ON s.sale_id = tr.sale_id
  GROUP BY tr.received_currency_id
) AS received
  ON received.iso_number = curr.iso_number
LEFT JOIN (
  SELECT tr.converted_currency_id as iso_number
  , SUM(tr.converted_amount) AS amount
  FROM cash_transactions AS tr
  JOIN CTE_SALE s ON s.sale_id = tr.sale_id
  GROUP BY tr.converted_currency_id
) AS converted
  ON converted.iso_number = curr.iso_number;
currency | total_price_paid | total_received_amount | total_converted_amount
:------- | ---------------: | --------------------: | ---------------------:
DKK      |              500 |                   300 |                    700
SEK      |              200 |                   400 |                      0
NOK      |                0 |                   150 |                      0

db<>fiddle here

  • Related