I have several tracked specimens that can move between different areas throughout the day (and its periods). In the example below the calculated field mov
has to restart counting whenever there is a change of specimen or day. If the period
varies but specimen
, day
and area
still the same then mov
= previous mov
. If both specimen
and day
remain the same and only the area
changes then mov
= previous mov
1.
Like this:
My questions are:
- How to copy the previous value of
mov
when period varies butspecimen
,day
andarea
remain the same? - How to increment the value when both
specimen
andday
remain the same and only thearea
changes?
I'm using PostgreSQL 12.11 (Ubuntu 12.11-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu.
Here is what I did so far:
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(
specimen INTEGER,
day INTEGER,
period INTEGER,
area INTEGER
);
INSERT INTO mytable (specimen,day,period,area)
VALUES
(1,1,1,1),
(1,1,2,1),
(1,1,3,2),
(1,1,4,2),
(1,1,5,3),
(1,1,6,2),
(1,2,1,2),
(1,2,2,1),
(1,2,3,2),
(1,2,4,3),
(1,2,5,2),
(1,2,6,1),
(2,1,1,3),
(2,1,2,3),
(2,1,3,1);
SELECT
*,
CASE
WHEN previous_specimen Is NULL THEN 1
WHEN specimen != previous_specimen THEN 1
WHEN specimen = previous_specimen AND day != previous_day THEN 1
-- WHEN specimen = previous_specimen AND day = previous_day AND area = previous_area THEN LAG(mov,1) OVER (ORDER BY specimen,day,period,area) -- **repeat previous value**
-- WHEN specimen = previous_specimen AND day = previous_day AND area != previous_area THEN LAG(mov,1) OVER (ORDER BY specimen,day,period,area) 1 -- **add 1 to previous value**
ELSE NULL
END AS mov
FROM (
SELECT
*,
LAG(specimen,1) OVER (ORDER BY specimen,day,period,area) previous_specimen,
LAG(day,1) OVER (ORDER BY specimen,day,period,area) previous_day,
LAG(area,1) OVER (ORDER BY specimen,day,period,area) previous_area
FROM mytable
) t1;
CodePudding user response:
This simpler query should do it:
SELECT specimen, day, period, area
, count(*) FILTER (WHERE step)
OVER (PARTITION BY specimen, day ORDER BY period) AS mov
FROM (
SELECT *
, lag(area) OVER (PARTITION BY specimen, day ORDER BY period) <> area AS step
FROM tbl
) sub
ORDER BY specimen, day; -- optional
If period
is an incremented number without gaps for every (specimen, day)
, this is equivalent:
SELECT t1.*
, count(*) FILTER (WHERE t1.area <> t2.area)
OVER (PARTITION BY t1.specimen, t1.day ORDER BY t1.period) AS mov
FROM tbl t1
LEFT JOIN tbl t2 ON t2.specimen = t1.specimen
AND t2.day = t1.day
AND t2.period = t1.period - 1
;
db<>fiddle here
Not sure which is faster. An index on (specimen, day, period)
would help (a lot) either way.
See:
About the aggregate FILTER
clause:
Should be fastest: