Im trying to understand windowing functions in Postgres and I don't get it. Here's an example. Just trying to get the last value of an record set restricted with where clause:
tid | pid | position | recorded | uvalue | cvalue | record_id
----- ----- ---------- --------------------- -------- -------- -----------
10 | 2 | 100 | 2022-02-01 12:00:00 | 270 | 270 | 102
10 | 2 | 100 | 2022-03-15 12:00:00 | 270 | 254 | 102
10 | 2 | 100 | 2022-11-17 12:00:00 | 270 | | 102
10 | 2 | 100 | 2022-12-01 14:00:00 | 270 | | 102
10 | 2 | 100 | 2023-02-03 11:00:00 | 270 | | 102
10 | 2 | 100 | 2023-03-01 04:00:00 | 270 | | 102
10 | 2 | 100 | 2023-04-01 04:00:00 | 270 | 251 | 102
And this the query
select last_value(cvalue) over (order by recorded)
from data
where recorded < '2022-11-17 12:00:00';
results in
last_value
------------
270
254
(2 rows)
It should be 254?
CodePudding user response:
The window functions calculate one result for each row selected by the WHERE
clause.
In your example there are two rows where recorded < '2022-11-17 12:00:00'
so it is normal that you get two rows in the final result.
Then the window is calculated for each row by the OVER()
clause which is by default equivalent to OVER(ROWS BETWEEN UNBOUDED PRECEDING AND CURRENT ROW)
. So it is again normal that you get 270
and 254
for both resulting rows aligned with the over (order by recorded)
clause of your query.
You should read carefully the manual §3.5 and §4.2.8.
If you only want one resulting row then you can do :
select cvalue
from data
where recorded < '2022-11-17 12:00:00'
order by recorded DESC
limit 1
or you can use an aggregate function :
select (array_agg(cvalue ORDER BY recorded DESC))[1]
from data
where recorded < '2022-11-17 12:00:00'
see the test result in dbfiddle
CodePudding user response:
This is a documented behaviour
Note that first_value, last_value, and nth_value consider only the rows within the “window frame”, which by default contains the rows from the start of the partition through the last peer of the current row
https://www.postgresql.org/docs/14/functions-window.html
If you want the other frame specify it explicitly, for example
with cte as (
select last_value(cvalue) over (order by recorded
rows between unbounded preceding and unbounded following) as lv
from data
)
select *
from cte
where recorded < '2022-11-17 12:00:00';
CodePudding user response:
last_value
will be executed for each row after applying the where clause. The where clause matches two rows so you get two rows in the result.
Your query expected result could be obtained by using order by
with offset fetch
:
select cvalue
from t
where recorded < '2022-11-17 12:00:00'
order by recorded desc
offset 0 rows
fetch first 1 row only
CodePudding user response:
Because last_value
window function will return the latest value each row after where
In your case, it will be suitable to use first_value
window function with order by recorded desc
that might return the latest cvalue
by your condition.
select first_value(cvalue) over (order by recorded desc)
from t
where recorded < '2022-11-17 12:00:00'