Home > Net >  with SQL, how to drop cases after a long gap in a time series?
with SQL, how to drop cases after a long gap in a time series?

Time:11-01

my data looks something like this:

CASE_TIMESTAMP GROUP
0 2017-12-26 16:12:09 00:00 A
1 2017-12-26 16:12:44 00:00 A
2 2020-04-21 07:00:00 00:00 A
3 2020-07-01 00:05:35 00:00 A
4 2020-08-06 07:00:00 00:00 A
5 2020-08-06 07:00:00 00:00 A
6 2020-08-06 07:00:00 00:00 A
7 2020-08-25 07:00:00 00:00 B
8 2020-09-22 07:00:00 00:00 B
9 2020-09-22 07:00:00 00:00 B
10 2020-12-04 08:00:00 00:00 B
11 2020-12-04 08:00:00 00:00 B
12 2020-12-07 08:00:00 00:00 B
13 2020-12-07 08:00:00 00:00 B
14 2020-12-07 08:00:00 00:00 B
15 2020-12-08 08:00:00 00:00 B
16 2020-12-08 08:00:00 00:00 B
17 2020-12-08 08:00:00 00:00 B

Need to drop cases that occurred before a gap of more than one day, so in group a all cases before 2020-08-06 and in B all cases before 2020-12-07.

Think I need a window function, but don't know how to calculate gaps and then drop all before, any ideas?

PS.I'm on snowflake

CodePudding user response:

Using QUALIFY and windowed MAX to find the latest CASE_TIMESTAMP per GRR:

CREATE TABLE t(CASE_TIMESTAMP TIMESTAMP,    GRP VARCHAR)
AS
          SELECT '2017-12-26 16:12:09 00:00','A'
UNION ALL SELECT '2017-12-26 16:12:44 00:00','A'
UNION ALL SELECT '2020-04-21 07:00:00 00:00','A'
UNION ALL SELECT '2020-07-01 00:05:35 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-25 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B';

Query:

SELECT *
FROM t
QUALIFY CASE_TIMESTAMP >= MAX(CASE_TIMESTAMP) OVER(PARTITION BY GRP) 
                          - INTERVAL '1 days';

Output:

enter image description here

  • Related