I am trying to solve the following question in a performance-oriented way. My current implementation involves ugly loops and is painfully slow.
Specifically, I have a table (transactions) of timestamped orders per customer for various items:
timestamp | customer | item | volume |
---|---|---|---|
2000 | Joe | A | 100 |
2001 | Joe | A | 200 |
2001 | Doe | A | 100 |
Besides, I have a second table (valuations) showing prices for the items:
timestamp | item | price |
---|---|---|
2000 | A | 1.1 |
2001 | A | 1.2 |
2002 | A | 1.3 |
Now, I would like to track the value (price*stock) of each customer's stock (cumulative volume) sampled according to the timestamp in the valuations table:
timestamp | customer | item | stock | value |
---|---|---|---|---|
2000 | Joe | A | 100 | 110 |
2001 | Joe | A | 300 | 360 |
2002 | Joe | A | 300 | 390 |
2001 | Doe | A | 100 | 120 |
2002 | Doe | A | 100 | 130 |
Essentially, this is going to be some form of (right) joining transactions and valuations. However, the catch here is that I would have to do one right join per (customer, item) combination. In other words, for every (customer, item) I would have to join the full set of timestamps.
My current (potentially pretty inefficient) solution loops across customers. For every customer, it creates the cumulative volume, right joins valuations and forward-fills (using the last function) columns coming from the transactions table:
CREATE OR REPLACE FUNCTION public.last_func(anyelement, anyelement)
RETURNS anyelement
LANGUAGE sql
IMMUTABLE STRICT
AS $function$
select $2;
$function$
;
create or replace function last_func(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$
select $2;
$$;
select
valuations.timestamp,
last(t.customer) over (partition by valuations.item order by valuations.timestamp) as customer,
valuations.item,
last(t.stock) over (partition by valuations.item order by valuations.timestamp) as stock,
last(t.stock) over (partition by valuations.item order by valuations.timestamp) * valuations.price as value
from (select
timestamp,
customer,
item,
volume as order_volume,
sum(volume) over (partition by item order by item, timestamp) as stock
from
transactions
where customer = 'Joe') t
right join
valuations on t.timestamp = valuations.timestamp and t.item = valuations.item
This seems rather inefficient and becomes very slow for a large number of customers. Does anyone have an idea how to do that in one go? Would be great if you could help me out here.
Thanks in advance and best regards
CodePudding user response:
Just a suggestion, since I can't test this on large amounts of data.
But what if you use a temporary table that contains all expected combinations of the customers and validations.
Then left join to customers to calculate the rolling sum.
For example:
create temporary table tmp_customer_valuations ( timestamp int not null, item varchar(30) not null, customer varchar(30) not null, price decimal(10,1) not null );
insert into tmp_customer_valuations (timestamp, item, price, customer) select v.timestamp, v.item, v.price, c.customer from valuations v join ( select item, customer, min(timestamp) as min_timestamp from transactions group by item, customer ) c on c.item = v.item and c.min_timestamp <= v.timestamp
create index idx_tmp_customer_valuations on tmp_customer_valuations (timestamp, item)
select tmp.timestamp , tmp.customer , tmp.item --, tr.volume as order_volume, , sum(coalesce(tr.volume, 0)) over (partition by tmp.item, tmp.customer order by tmp.timestamp) as stock , tmp.price * sum(coalesce(tr.volume, 0)) over (partition by tmp.item, tmp.customer order by tmp.timestamp) as value from tmp_customer_valuations tmp left join transactions tr on tr.timestamp = tmp.timestamp and tr.item = tmp.item and tr.customer = tmp.customer order by tmp.customer desc, tmp.item, tmp.timestamp;
timestamp | customer | item | stock | value |
---|---|---|---|---|
2000 | Joe | A | 100 | 110.0 |
2001 | Joe | A | 300 | 360.0 |
2002 | Joe | A | 300 | 390.0 |
2001 | Doe | A | 100 | 120.0 |
2002 | Doe | A | 100 | 130.0 |
db<>fiddle here
(Btw, also verify if the tables could use an extra index)
CodePudding user response:
You can use the following query
SELECT timestamp,customer,item,
SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) AS stock,
SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) * price AS value
FROM
(SELECT v.timestamp,t.customer,v.price,t.item,MAX(t.volume) AS volume
FROM valuations v
JOIN transactions t ON t.timestamp <= v.timestamp
GROUP BY v.timestamp,t.customer,t.item,v.price) t1
ORDER BY customer DESC,price
or
SELECT t.timestamp,t.customer,t.item,stock,stock * price AS value
FROM
(SELECT t.timestamp,t.customer,t.item,v.price,
SUM(t.volume) OVER(PARTITION BY t.timestamp ORDER BY t.customer,t.timestamp DESC) as stock
FROM transactions t JOIN valuations v ON t.timestamp = v.timestamp
UNION ALL
SELECT timestamp,customer,item,price,stock
FROM valuations v CROSS JOIN
(SELECT customer,MAX(stock) stock
FROM
(SELECT *,SUM(volume) OVER(PARTITION BY timestamp ORDER BY customer,timestamp DESC) stock
FROM transactions) t
GROUP BY customer) t
WHERE timestamp NOT IN (SELECT timestamp FROM transactions)) t
ORDER BY customer DESC,price;
Demo In db<>fiddle