I have some sales data and I need to dispatch the quantities in available slots stored in a separate table.
Example of sales data:
id | qty |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 1 |
5 | 3 |
6 | 9 |
create table sales (id serial primary key, qty int);
INSERT INTO sales (qty) VALUES (1),(1),(1),(1),(3),(9);
And here are the slots (size is cumulative):
name | size |
---|---|
a | 3 |
b | 5 |
c | 9 |
There is an "invisible" d
slot for the remaining quantities. Of course, these slots are not fixed.
create table slots (id serial primary key, name text, size int);
insert into slots (name,size) values ('a',3),('b',5),('c',9);
This is the detailed process of the calculation (and expected result):
id | qty | used | total | slot | comment |
---|---|---|---|---|---|
1 | 1 | 1 | 1 | a | |
2 | 1 | 1 | 2 | a | |
3 | 1 | 1 | 3 | a | Slot is full |
4 | 1 | 1 | 4 | b | |
5 | 3 | 1 | 5 | b | split this sale |
5 | cont. | 2 | 6-7 | c | |
6 | 9 | 2 | 8-9 | c | |
6 | cont. | 7 | 10-16 | d | split again |
My approach is the following:
WITH p1 AS (
select sales.*,sum(qty) over (order by sales.id) AS tot
from sales
order by sales.id
)
, p2 AS (
select p1.id
,p1.qty
, 1 COALESCE(LAG(tot) OVER (ORDER BY p1.id) ,0) AS lag
, p1.tot
from p1
)
,p3 AS (
SELECT id,qty,lag,tot
, COALESCE( (SELECT id FROM slots WHERE lag<=size ORDER BY size LIMIT 1) , 999) AS slotmin
, COALESCE( (SELECT id FROM slots WHERE tot<=size ORDER BY size LIMIT 1) , 999) AS slotmax
FROM p2
)
SELECT p3.id
,p3.qty
-- ,p3.lag
-- ,p3.tot
-- ,p3.slotmin
-- ,p3.slotmax
,COALESCE(slots.size-lag 1,p3.qty) AS qtyconsumed
,1 AS so
from p3
LEFT JOIN slots ON slots.id=p3.slotmin AND slotmin<>slotmax
UNION
SELECT p3.id
,p3.qty
-- ,p3.lag
-- ,p3.tot
-- ,p3.slotmin
-- ,p3.slotmax
,COALESCE(p3.qty-(slots.size-lag 1),0) AS qtyconsumed
,2 AS so
from p3
LEFT JOIN slots ON slots.id=p3.slotmin
WHERE slotmin<>slotmax
ORDER BY id,so
I deliberately left some commented columns to ease understanding/debugging my mental process
Which gives the following result:
id | qty | qtyconsumed | so
---- ----- ------------- ----
1 | 1 | 1 | 1
2 | 1 | 1 | 1
3 | 1 | 1 | 1
4 | 1 | 1 | 1
5 | 3 | 1 | 1
| | 2 | 2
6 | 9 | 2 | 1
| | 7 | 2
(8 lignes)
The code is pretty ugly and will only work for one split: I could very well have only one sale of qty=20 that should split to 4 slots.
My question unfolds in 2 parts:
Is it possible to do that splitting in more than 2 slots without going through the mess of cursors or solutions alike? I am guessing WITH RECURSIVE but there could be something smarter.
How can I optimize this (particularly the search for the closest slot in
p3
) because the real case involves millions of rows?
CodePudding user response:
I propose you a different approach based on the range types and range operations :
First we can calculate the range of quantities for every sale in the sales id order :
SELECT id, qty
, int4range
( (1 sum(qty) OVER w - qty) :: integer
, sum(qty) OVER w :: integer
, '[]'
) AS sale_range
FROM sales
WINDOW w AS (ORDER by id)
ORDER BY id
Then we can calculate the range of sizes for every slot in the slot size order :
SELECT name, size
, int4range
( (1 sum(size) OVER w - COALESCE(size, 0)) :: integer
, CASE WHEN size IS NULL THEN NULL :: integer ELSE sum(size) OVER w :: integer END
, '[]'
) AS slot_range
FROM slots
WINDOW w AS (ORDER BY size)
ORDER BY size
Finally we can simply JOIN
both sub queries while testing the overlap between the sales ranges and the slots ranges in the ON
clause. For the rows which match, the intersection between the sales range and the slot range provides the total
value as expected, and the length of the intersected ranges (upper(range) - lower(range)
) provides the used
quantity :
WITH sales_ranges AS
(
SELECT id, qty
, int4range
( (1 sum(qty) OVER w - qty) :: integer
, sum(qty) OVER w :: integer
, '[]'
) AS sale_range
FROM sales
WINDOW w AS (ORDER by id)
-- ORDER BY id -- this ORDER BY is useless here
), slots_ranges AS
(
SELECT name, size
, int4range
( (1 sum(size) OVER w - COALESCE(size, 0)) :: integer
, CASE WHEN size IS NULL THEN NULL :: integer ELSE sum(size) OVER w :: integer END
, '[]'
) AS slot_range
FROM slots
WINDOW w AS (ORDER BY size)
-- ORDER BY size -- this ORDER BY is useless here
)
SELECT sa.id, sa.qty
, upper(rg.range) - lower(rg.range) AS used
, CASE
WHEN upper(rg.range) = lower(rg.range) 1
THEN lower(rg.range) :: text
ELSE lower(rg.range) || '-' || upper(rg.range) - 1
END AS total
, so.name AS slot
FROM sales_ranges AS sa
INNER JOIN slots_ranges AS so
ON sa.sale_range && so.slot_range -- both ranges overlap
CROSS JOIN LATERAL (SELECT sa.sale_range * so.slot_range) AS rg(range) -- compute the intersection of both ranges
The proposed solution assumes that slot "d" is created in table "slots" with a NULL
size.
see the test result in dbfiddle
CodePudding user response:
You can do:
select id, qty, last_pos - first_pos 1 as used,
case when last_pos = first_pos then '' || first_pos
else '' || first_pos || '-' || last_pos end as total,
name as slot
from (
select
b.id, b.qty,
greatest(b.first_pos, s.first_pos) as first_pos,
least(b.last_pos, s.last_pos) as last_pos,
s.name
from (
select *,
sum(qty) over(order by id) - qty 1 as first_pos,
sum(qty) over(order by id) as last_pos
from sales
) b
left join (
select *,
coalesce(lag(size) over(order by id), 0) 1 as first_pos,
size as last_pos
from (select * from slots union all
select max(id) 1, 'd', max(size) 1000000000 from slots) x
) s on b.last_pos between s.first_pos and s.last_pos
or b.first_pos between s.first_pos and s.last_pos
) j
Result:
id qty used total slot
--- ---- ----- ------ ----
1 1 1 1 a
2 1 1 2 a
3 1 1 3 a
4 1 1 4 b
5 3 1 5 b
5 3 2 6-7 c
6 9 2 8-9 c
6 9 7 10-16 d
See running example at DB Fiddle.
Essentially, you can compute the corresponding positions of the ordered sets on each side, and then you join by them. The query above produces all the information you need to post-process the data and add extra info as needed.
Note: High performance should be trivial and can be achieved using the right indexes. You can try the query and see if it's fast enough. If it's not, then post the execution plan and the existng indexes for further improvement.