my data looks something like this:
CASE_TIMESTAMP | GROUP | |
---|---|---|
0 | 2017-12-26 16:12:09 00:00 | A |
1 | 2017-12-26 16:12:44 00:00 | A |
2 | 2020-04-21 07:00:00 00:00 | A |
3 | 2020-07-01 00:05:35 00:00 | A |
4 | 2020-08-06 07:00:00 00:00 | A |
5 | 2020-08-06 07:00:00 00:00 | A |
6 | 2020-08-06 07:00:00 00:00 | A |
7 | 2020-08-25 07:00:00 00:00 | B |
8 | 2020-09-22 07:00:00 00:00 | B |
9 | 2020-09-22 07:00:00 00:00 | B |
10 | 2020-12-04 08:00:00 00:00 | B |
11 | 2020-12-04 08:00:00 00:00 | B |
12 | 2020-12-07 08:00:00 00:00 | B |
13 | 2020-12-07 08:00:00 00:00 | B |
14 | 2020-12-07 08:00:00 00:00 | B |
15 | 2020-12-08 08:00:00 00:00 | B |
16 | 2020-12-08 08:00:00 00:00 | B |
17 | 2020-12-08 08:00:00 00:00 | B |
Need to drop cases that occurred before a gap of more than one day, so in group a all cases before 2020-08-06 and in B all cases before 2020-12-07.
Think I need a window function, but don't know how to calculate gaps and then drop all before, any ideas?
PS.I'm on snowflake
CodePudding user response:
Using QUALIFY and windowed MAX to find the latest CASE_TIMESTAMP per GRR:
CREATE TABLE t(CASE_TIMESTAMP TIMESTAMP, GRP VARCHAR)
AS
SELECT '2017-12-26 16:12:09 00:00','A'
UNION ALL SELECT '2017-12-26 16:12:44 00:00','A'
UNION ALL SELECT '2020-04-21 07:00:00 00:00','A'
UNION ALL SELECT '2020-07-01 00:05:35 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00 00:00','A'
UNION ALL SELECT '2020-08-25 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00 00:00','B';
Query:
SELECT *
FROM t
QUALIFY CASE_TIMESTAMP >= MAX(CASE_TIMESTAMP) OVER(PARTITION BY GRP)
- INTERVAL '1 days';
Output: