Partitioning (and splitting) values to fill slots-CodePudding

I have some sales data and I need to dispatch the quantities in available slots stored in a separate table.

Example of sales data:

id	qty
1	1
2	1
3	1
4	1
5	3
6	9

create table sales (id serial primary key, qty int);

INSERT INTO sales (qty) VALUES (1),(1),(1),(1),(3),(9);

And here are the slots (size is cumulative):

name	size
a	3
b	5
c	9

There is an "invisible" d slot for the remaining quantities. Of course, these slots are not fixed.

create table slots (id serial primary key, name text, size int);

insert into slots (name,size) values ('a',3),('b',5),('c',9);

This is the detailed process of the calculation (and expected result):

id	qty	used	total	slot	comment
1	1	1	1	a
2	1	1	2	a
3	1	1	3	a	Slot is full
4	1	1	4	b
5	3	1	5	b	split this sale
5	cont.	2	6-7	c
6	9	2	8-9	c
6	cont.	7	10-16	d	split again

My approach is the following:

WITH p1 AS (
    select sales.*,sum(qty) over (order by sales.id) AS tot 
    from sales 
    order by sales.id
)
, p2 AS (
    select p1.id
        ,p1.qty
        , 1 COALESCE(LAG(tot) OVER (ORDER BY p1.id) ,0) AS lag
        , p1.tot
    from p1
)
,p3 AS (
    SELECT id,qty,lag,tot
        , COALESCE( (SELECT id FROM slots WHERE lag<=size ORDER BY size LIMIT 1) , 999) AS slotmin
        , COALESCE( (SELECT id FROM slots WHERE tot<=size ORDER BY size LIMIT 1) , 999) AS slotmax
    FROM p2
)
SELECT p3.id
    ,p3.qty
    -- ,p3.lag
    -- ,p3.tot
    -- ,p3.slotmin
    -- ,p3.slotmax
    ,COALESCE(slots.size-lag 1,p3.qty) AS qtyconsumed
    ,1 AS so
from p3
LEFT JOIN slots ON slots.id=p3.slotmin AND slotmin<>slotmax
UNION
SELECT p3.id
    ,p3.qty
    -- ,p3.lag
    -- ,p3.tot
    -- ,p3.slotmin
    -- ,p3.slotmax
    ,COALESCE(p3.qty-(slots.size-lag 1),0) AS qtyconsumed
    ,2 AS so
from p3
LEFT JOIN slots ON slots.id=p3.slotmin
WHERE slotmin<>slotmax
ORDER BY id,so

I deliberately left some commented columns to ease understanding/debugging my mental process

Which gives the following result:

 id | qty | qtyconsumed | so 
---- ----- ------------- ----
  1 |   1 |           1 |  1
  2 |   1 |           1 |  1
  3 |   1 |           1 |  1
  4 |   1 |           1 |  1
  5 |   3 |           1 |  1
    |     |           2 |  2
  6 |   9 |           2 |  1
    |     |           7 |  2
(8 lignes)

The code is pretty ugly and will only work for one split: I could very well have only one sale of qty=20 that should split to 4 slots.

My question unfolds in 2 parts:

Is it possible to do that splitting in more than 2 slots without going through the mess of cursors or solutions alike? I am guessing WITH RECURSIVE but there could be something smarter.
How can I optimize this (particularly the search for the closest slot in p3) because the real case involves millions of rows?

CodePudding user response：

I propose you a different approach based on the range types and range operations :

First we can calculate the range of quantities for every sale in the sales id order :

SELECT id, qty
     , int4range
          ( (1   sum(qty) OVER w - qty) :: integer
          , sum(qty) OVER w :: integer
          , '[]'
          ) AS sale_range
  FROM sales
WINDOW w AS (ORDER by id)
 ORDER BY id

Then we can calculate the range of sizes for every slot in the slot size order :

SELECT name, size
     , int4range
          ( (1   sum(size) OVER w - COALESCE(size, 0)) :: integer
          , CASE WHEN size IS NULL THEN NULL :: integer ELSE sum(size) OVER w :: integer END
          , '[]'
          ) AS slot_range
  FROM slots
WINDOW w AS (ORDER BY size)
 ORDER BY size

Finally we can simply JOIN both sub queries while testing the overlap between the sales ranges and the slots ranges in the ON clause. For the rows which match, the intersection between the sales range and the slot range provides the total value as expected, and the length of the intersected ranges (upper(range) - lower(range)) provides the used quantity :

WITH sales_ranges AS
(
SELECT id, qty
     , int4range
          ( (1   sum(qty) OVER w - qty) :: integer
          , sum(qty) OVER w :: integer
          , '[]'
          ) AS sale_range
  FROM sales
WINDOW w AS (ORDER by id)
-- ORDER BY id -- this ORDER BY is useless here
), slots_ranges AS
(
SELECT name, size
     , int4range
          ( (1   sum(size) OVER w - COALESCE(size, 0)) :: integer
          , CASE WHEN size IS NULL THEN NULL :: integer ELSE sum(size) OVER w :: integer END
          , '[]'
          ) AS slot_range
  FROM slots
WINDOW w AS (ORDER BY size)
-- ORDER BY size -- this ORDER BY is useless here
)
SELECT sa.id, sa.qty
     , upper(rg.range) - lower(rg.range) AS used
     , CASE
         WHEN upper(rg.range) = lower(rg.range)   1
         THEN lower(rg.range) :: text
         ELSE lower(rg.range) || '-' || upper(rg.range) - 1
       END AS total
     , so.name AS slot
  FROM sales_ranges AS sa
 INNER JOIN slots_ranges AS so
    ON sa.sale_range && so.slot_range -- both ranges overlap
 CROSS JOIN LATERAL (SELECT sa.sale_range * so.slot_range) AS rg(range) -- compute the intersection of both ranges

The proposed solution assumes that slot "d" is created in table "slots" with a NULL size.

see the test result in dbfiddle

CodePudding user response：

You can do:

select id, qty, last_pos - first_pos   1 as used,
  case when last_pos = first_pos then '' || first_pos
       else '' || first_pos || '-' || last_pos end as total,
  name as slot
from (
select
  b.id, b.qty, 
  greatest(b.first_pos, s.first_pos) as first_pos,
  least(b.last_pos, s.last_pos) as last_pos,
  s.name
from (
  select *, 
    sum(qty) over(order by id) - qty   1 as first_pos,
    sum(qty) over(order by id) as last_pos
  from sales
) b 
left join (
  select *,
    coalesce(lag(size) over(order by id), 0)   1 as first_pos,
    size as last_pos
  from (select * from slots union all
        select max(id)   1, 'd', max(size)   1000000000 from slots) x
) s on b.last_pos between s.first_pos and s.last_pos
    or b.first_pos between s.first_pos and s.last_pos
) j

Result:

 id  qty  used  total  slot 
 --- ---- ----- ------ ---- 
 1   1    1     1      a    
 2   1    1     2      a    
 3   1    1     3      a    
 4   1    1     4      b    
 5   3    1     5      b    
 5   3    2     6-7    c    
 6   9    2     8-9    c    
 6   9    7     10-16  d

See running example at DB Fiddle.

Essentially, you can compute the corresponding positions of the ordered sets on each side, and then you join by them. The query above produces all the information you need to post-process the data and add extra info as needed.

Note: High performance should be trivial and can be achieved using the right indexes. You can try the query and see if it's fast enough. If it's not, then post the execution plan and the existng indexes for further improvement.