Home > front end >  Get most recent row whose timestamp is at least 5 minutes ago using postgres, timescaledb
Get most recent row whose timestamp is at least 5 minutes ago using postgres, timescaledb

Time:12-11

I am having trouble with something that I think timescaledb can help with.

Suppose I have this table:

CREATE TABLE purchase
(
    id integer NOT NULL DEFAULT nextval('purchase_id_seq'::regclass),
    "timestamp" timestamp without time zone NOT NULL,
    country character varying(128) COLLATE pg_catalog."default",
    product character varying(128) COLLATE pg_catalog."default",
    quantity numeric(64,32),
    price numeric(64,32)
)

Which represents Purchases:

  • id
  • timestamp of purchase
  • country it occurred in
  • product being purchased
  • quantity purchased
  • price paid per 1 quantity

For each purchase, I want to calculate (pseudo code):

price - last price for given (country, product) where timestamp - timestamp of old record > 5 minutes

For example, if I have these purchases:

id timestamp              country product quantity price
1  2021-12-09 07:12:11.13 US      apple   1        1.2
2  2021-12-09 07:13:11.13 US      apple   2        1.3
3  2021-12-09 07:19:12.13 US      apple   2        1.4
4  2021-12-09 07:20:19.13 US      apple   2        0.9

Then I would have these deltas

id timestamp              country product quantity price last_price_at_least_five_minutes_ago
1  2021-12-09 07:12:11.13 US      apple   1        1.2   NULL
2  2021-12-09 07:13:11.13 US      apple   2        1.3   NULL
3  2021-12-09 07:17:12.13 US      apple   2        1.4   1.2
4  2021-12-09 07:20:19.13 US      apple   2        0.9   1.3

For each CURRENTROW, what is the easiest way to ask for "the price of the row with the highest timestamp that is less than CURRENTROW - '5 minutes' ?

Stupidly I have tried this:

SELECT
    t1.country,
    t1.product,
    t1.timestamp,
    t1.id,
    t1.price,
    t2.id AS last_id,
    t2.timestamp AS last_timestamp,
    t2.price AS last_price
FROM
    purchase t1
LEFT JOIN purchase t2
ON
    t2.timestamp < t1.timestamp - INTERVAL '5m' AND
    t1.country = t2.country AND
    t1.product = t2.product
GROUP BY
    t1.country,
    t1.product,
    t1.id,
    t1.price,
    t1.timestamp,
    t2.id,
    t2.price,
    t2.timestamp

But this hangs and I am sure it is doing a lot of unnecessary work, as the result set of the join is massive (each row * N where N is number of rows that are 5m ago)

CodePudding user response:

You can use a scalar subquery reading literally "the price of the row with the highest timestamp that is less than CURRENTROW - 5 minutes".

select *,      
(
  select price 
  from purchase 
  where product = currentrow.product -- more conditions can be added here
  and "timestamp" < currentrow."timestamp" - interval '5 minutes' -- "less than CURRENTROW - 5 minutes"
  order by "timestamp" desc limit 1 -- "the highest timestamp"
) as last_price_at_least_five_minutes_ago
from purchase as currentrow;

currentrow is not a good enough alias for purchase table but fits well in the logic of the subquery.

CodePudding user response:

You could also use a window function, although you'd have to determine any performance indications on larger datasets and the range you're querying.

This example selects the last value in the partition (by product in this case) with time ordered ASC. In order to get a value that is at least 5 minutes ago, you have to setup a window range that starts longer ago than 5 minutes (this uses UNBOUNDED PRECEDING) and 5 minutes ago ('5 minutes'::interval PRECEDING).

Using UNBOUNDED PRECEDING could end up being costly depending on how many rows you have, so you could set some kind of starting interval if you knew/expect some kind of regularity to your data (something like range between '20 minute'::interval PRECEDING and '5 minute'::interval PRECEDING)

select ts, country, product,quantity,price, 
    last_value(purchase.price) over w as last_price_at_least_five_minutes_ago
from purchase
window w as (partition by product order by ts range between unbounded PRECEDING and '5 minutes'::interval PRECEDING)
order by ts asc;

Results in:

ts                     |country|product|quantity|price|last_price_at_least_five_minutes_ago|
----------------------- ------- ------- -------- ----- ------------------------------------ 
2021-12-09 07:12:11.130|US     |apple  |     1.0|  1.2|                                    |
2021-12-09 07:13:11.130|US     |apple  |     2.0|  1.3|                                    |
2021-12-09 07:17:12.130|US     |apple  |     2.0|  1.4|                                 1.2|
2021-12-09 07:20:19.130|US     |apple  |     2.0|  0.9|                                 1.3|
  • Related