Aggregate window function and outer join-CodePudding

I am trying to solve the following question in a performance-oriented way. My current implementation involves ugly loops and is painfully slow.

Specifically, I have a table (transactions) of timestamped orders per customer for various items:

timestamp	customer	item	volume
2000	Joe	A	100
2001	Joe	A	200
2001	Doe	A	100

Besides, I have a second table (valuations) showing prices for the items:

timestamp	item	price
2000	A	1.1
2001	A	1.2
2002	A	1.3

Now, I would like to track the value (price*stock) of each customer's stock (cumulative volume) sampled according to the timestamp in the valuations table:

timestamp	customer	item	stock	value
2000	Joe	A	100	110
2001	Joe	A	300	360
2002	Joe	A	300	390
2001	Doe	A	100	120
2002	Doe	A	100	130

Essentially, this is going to be some form of (right) joining transactions and valuations. However, the catch here is that I would have to do one right join per (customer, item) combination. In other words, for every (customer, item) I would have to join the full set of timestamps.

My current (potentially pretty inefficient) solution loops across customers. For every customer, it creates the cumulative volume, right joins valuations and forward-fills (using the last function) columns coming from the transactions table:

CREATE OR REPLACE FUNCTION public.last_func(anyelement, anyelement)
 RETURNS anyelement
 LANGUAGE sql
 IMMUTABLE STRICT
AS $function$
select $2;
$function$
;

   create or replace function last_func(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$
    select $2;
$$;
    
select 
    valuations.timestamp,
    last(t.customer) over (partition by valuations.item order by valuations.timestamp) as customer,
    valuations.item,
    last(t.stock) over (partition by valuations.item order by valuations.timestamp) as stock,
    last(t.stock) over (partition by valuations.item order by valuations.timestamp) * valuations.price as value
from (select 
    timestamp,
    customer,
    item,
    volume as order_volume,
    sum(volume) over (partition by item order by item, timestamp) as stock
from 
    transactions
where customer = 'Joe') t
right join 
    valuations on t.timestamp = valuations.timestamp and t.item = valuations.item

This seems rather inefficient and becomes very slow for a large number of customers. Does anyone have an idea how to do that in one go? Would be great if you could help me out here.

Thanks in advance and best regards

CodePudding user response：

Just a suggestion, since I can't test this on large amounts of data.

But what if you use a temporary table that contains all expected combinations of the customers and validations.

Then left join to customers to calculate the rolling sum.

For example:

create temporary table tmp_customer_valuations (
 timestamp int not null, 
 item varchar(30) not null, 
 customer varchar(30) not null, 
 price decimal(10,1) not null
);

insert into tmp_customer_valuations
(timestamp, item, price, customer)
select v.timestamp, v.item, v.price, c.customer
from valuations v
join (
  select item, customer, min(timestamp) as min_timestamp
  from transactions
  group by item, customer 
) c
  on c.item = v.item
 and c.min_timestamp <= v.timestamp

create index idx_tmp_customer_valuations
on tmp_customer_valuations (timestamp, item)

select 
  tmp.timestamp
, tmp.customer
, tmp.item
--, tr.volume as order_volume,
, sum(coalesce(tr.volume, 0)) 
     over (partition by tmp.item, tmp.customer 
           order by tmp.timestamp) as stock
, tmp.price * sum(coalesce(tr.volume, 0)) 
     over (partition by tmp.item, tmp.customer 
           order by tmp.timestamp) as value
from tmp_customer_valuations tmp
left join transactions tr
  on tr.timestamp = tmp.timestamp 
 and tr.item = tmp.item
 and tr.customer = tmp.customer
order by
 tmp.customer desc,
 tmp.item,
 tmp.timestamp;

timestamp	customer	item	stock	value
2000	Joe	A	100	110.0
2001	Joe	A	300	360.0
2002	Joe	A	300	390.0
2001	Doe	A	100	120.0
2002	Doe	A	100	130.0

db<>fiddle here

(Btw, also verify if the tables could use an extra index)

CodePudding user response：

You can use the following query

SELECT timestamp,customer,item,
    SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) AS stock,
    SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) * price AS value
FROM

  (SELECT v.timestamp,t.customer,v.price,t.item,MAX(t.volume) AS volume
  FROM valuations v 
  JOIN transactions t  ON t.timestamp <= v.timestamp
  GROUP BY v.timestamp,t.customer,t.item,v.price) t1
  
ORDER BY customer DESC,price

SELECT t.timestamp,t.customer,t.item,stock,stock * price AS value
FROM

    (SELECT t.timestamp,t.customer,t.item,v.price,
       SUM(t.volume) OVER(PARTITION BY t.timestamp ORDER BY t.customer,t.timestamp DESC) as stock
    FROM transactions t JOIN valuations v ON t.timestamp = v.timestamp


    UNION ALL

    SELECT timestamp,customer,item,price,stock
    FROM valuations v CROSS JOIN

      (SELECT customer,MAX(stock) stock
       FROM
         (SELECT *,SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) stock
          FROM transactions) t
       GROUP BY customer) t
    WHERE timestamp NOT IN (SELECT timestamp FROM transactions)) t


ORDER BY customer DESC,price;

Demo In db<>fiddle