I want to aggregate data based on streaks of continuous activity.
DDL:
CREATE TABLE t_series (t date, data int)
INSERT INTO t_series VALUES
(date '2018-03-01',12),
(date '2018-03-02',43),
(date '2018-03-03',9),
(date '2018-03-04',13),
(date '2018-03-09',23),
(date '2018-03-10',26),
(date '2018-03-11',28),
(date '2018-03-14',21),
(date '2018-03-15',15)
I want an intermediate output as:
t | data | period
------------ ------ ------
2018-03-01 | 12 | 1
2018-03-02 | 43 | 1
2018-03-03 | 9 | 1
2018-03-04 | 13 | 1
2018-03-09 | 23 | 2
2018-03-10 | 26 | 2
2018-03-11 | 28 | 2
2018-03-14 | 21 | 3
2018-03-15 | 15 | 3
And the final output as:
period | sum
-------- -----
1 | 77
2 | 77
3 | 36
I have tried using below but doesn't seem to work:
SELECT *, SUM(CASE WHEN diff IS NULL
OR diff <2 THEN 1 ELSE NULL END) OVER (ORDER BY t) AS period
FROM (SELECT *, t - lag(t, 1) OVER (ORDER BY t) AS diff
FROM t_series
) AS x;
Could anyone please suggest a fix.
Thanks in advance.
CodePudding user response:
I came up with this solution:
SELECT period, SUM(data) AS sum
FROM (
SELECT t, data, SUM(groups) OVER (ORDER BY t) AS period
FROM (
SELECT t, data,
CASE
WHEN diff IS NULL OR diff = 1 THEN 0
ELSE 1
END AS groups
FROM (
SELECT t, data, t - LAG(t) OVER (ORDER BY t) AS diff
FROM t_series
) d
) g -- your intermediate output
) p
GROUP BY period
ORDER BY period
;
Result:
period | sum
-------- -----
0 | 77
1 | 77
2 | 36
The only difference is that my period
starts with 0
, but I think it's ok