Home > OS >  Postgres LAST_VALUE returns not the last value when using where clause in main query
Postgres LAST_VALUE returns not the last value when using where clause in main query

Time:03-08

Im trying to understand windowing functions in Postgres and I don't get it. Here's an example. Just trying to get the last value of an record set restricted with where clause:

 tid | pid | position |      recorded       | uvalue | cvalue | record_id 
----- ----- ---------- --------------------- -------- -------- -----------
  10 |   2 |      100 | 2022-02-01 12:00:00 |    270 |    270 |       102
  10 |   2 |      100 | 2022-03-15 12:00:00 |    270 |    254 |       102
  10 |   2 |      100 | 2022-11-17 12:00:00 |    270 |        |       102
  10 |   2 |      100 | 2022-12-01 14:00:00 |    270 |        |       102
  10 |   2 |      100 | 2023-02-03 11:00:00 |    270 |        |       102
  10 |   2 |      100 | 2023-03-01 04:00:00 |    270 |        |       102
  10 |   2 |      100 | 2023-04-01 04:00:00 |    270 |    251 |       102

And this the query

select last_value(cvalue) over (order by recorded) 
from data 
where recorded < '2022-11-17 12:00:00';

results in

 last_value 
------------
        270
        254
(2 rows)

It should be 254?

CodePudding user response:

The window functions calculate one result for each row selected by the WHERE clause.

In your example there are two rows where recorded < '2022-11-17 12:00:00' so it is normal that you get two rows in the final result.

Then the window is calculated for each row by the OVER() clause which is by default equivalent to OVER(ROWS BETWEEN UNBOUDED PRECEDING AND CURRENT ROW). So it is again normal that you get 270 and 254 for both resulting rows aligned with the over (order by recorded) clause of your query.

You should read carefully the manual §3.5 and §4.2.8.

If you only want one resulting row then you can do :

select cvalue
  from data 
 where recorded < '2022-11-17 12:00:00'
 order by recorded DESC
 limit 1

or you can use an aggregate function :

select (array_agg(cvalue ORDER BY recorded DESC))[1]
  from data 
 where recorded < '2022-11-17 12:00:00'

see the test result in dbfiddle

CodePudding user response:

This is a documented behaviour

Note that first_value, last_value, and nth_value consider only the rows within the “window frame”, which by default contains the rows from the start of the partition through the last peer of the current row

https://www.postgresql.org/docs/14/functions-window.html

If you want the other frame specify it explicitly, for example

with cte as (
    select last_value(cvalue) over (order by recorded
       rows between unbounded preceding and unbounded following) as lv
    from data
)
select *
from cte
where recorded < '2022-11-17 12:00:00';

CodePudding user response:

last_value will be executed for each row after applying the where clause. The where clause matches two rows so you get two rows in the result.

Your query expected result could be obtained by using order by with offset fetch:

select cvalue
from t 
where recorded < '2022-11-17 12:00:00'
order by recorded desc
offset 0 rows
fetch first 1 row only

CodePudding user response:

Because last_value window function will return the latest value each row after where

In your case, it will be suitable to use first_value window function with order by recorded desc that might return the latest cvalue by your condition.

select first_value(cvalue) over (order by recorded desc)
from t
where recorded < '2022-11-17 12:00:00'
  • Related