How to copy the previous row value of the same column?-CodePudding

I have several tracked specimens that can move between different areas throughout the day (and its periods). In the example below the calculated field mov has to restart counting whenever there is a change of specimen or day. If the period varies but specimen, day and area still the same then mov = previous mov. If both specimen and day remain the same and only the area changes then mov = previous mov 1.

Like this:

My questions are:

How to copy the previous value of mov when period varies but specimen, day and area remain the same?
How to increment the value when both specimen and day remain the same and only the area changes?

I'm using PostgreSQL 12.11 (Ubuntu 12.11-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu.

Here is what I did so far:

DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(
   specimen INTEGER,
   day      INTEGER,
   period   INTEGER,
   area     INTEGER
);


INSERT INTO mytable (specimen,day,period,area)
VALUES 
(1,1,1,1),
(1,1,2,1),
(1,1,3,2),
(1,1,4,2),
(1,1,5,3),
(1,1,6,2),
(1,2,1,2),
(1,2,2,1),
(1,2,3,2),
(1,2,4,3),
(1,2,5,2),
(1,2,6,1),
(2,1,1,3),
(2,1,2,3),
(2,1,3,1);

SELECT
*,
CASE
    WHEN previous_specimen Is NULL THEN 1
    WHEN specimen != previous_specimen THEN 1
    WHEN specimen = previous_specimen AND day != previous_day THEN 1
    -- WHEN specimen = previous_specimen AND day = previous_day AND area = previous_area THEN LAG(mov,1) OVER (ORDER BY specimen,day,period,area) -- **repeat previous value**
    -- WHEN specimen = previous_specimen AND day = previous_day AND area != previous_area THEN LAG(mov,1) OVER (ORDER BY specimen,day,period,area)   1 -- **add 1 to previous value**    
     ELSE NULL
END AS mov

FROM (
SELECT
*,
LAG(specimen,1) OVER (ORDER BY specimen,day,period,area) previous_specimen,
LAG(day,1) OVER (ORDER BY specimen,day,period,area) previous_day,
LAG(area,1) OVER (ORDER BY specimen,day,period,area) previous_area
FROM mytable
) t1;

CodePudding user response：

This simpler query should do it:

SELECT specimen, day, period, area 
     , count(*) FILTER (WHERE step)
                OVER (PARTITION BY specimen, day ORDER BY period) AS mov
FROM  (
   SELECT *
        , lag(area) OVER (PARTITION BY specimen, day ORDER BY period) <> area AS step
   FROM   tbl
   ) sub
ORDER  BY specimen, day;  -- optional

If period is an incremented number without gaps for every (specimen, day), this is equivalent:

SELECT t1.*
     , count(*) FILTER (WHERE t1.area <> t2.area)
                OVER (PARTITION BY t1.specimen, t1.day ORDER BY t1.period)  AS mov
FROM   tbl t1
LEFT   JOIN tbl t2 ON t2.specimen = t1.specimen
                  AND t2.day      = t1.day
                  AND t2.period   = t1.period - 1
;

db<>fiddle here

Not sure which is faster. An index on (specimen, day, period) would help (a lot) either way.

See:

About the aggregate FILTER clause:

Aggregate columns with additional (distinct) filters

Should be fastest:

For absolute performance, is SUM faster or COUNT?