I have data in the following format:
ID | Period | Date |
---|---|---|
1 | 1 | 2020-09-05 |
1 | 2 | 2020-10-01 |
1 | 3 | 2020-10-30 |
2 | 2 | 2020-10-01 |
2 | 4 | 2020-12-05 |
2 | 6 | 2021-02-05 |
3 | 1 | 2020-09-05 |
I want to calculate a streak grouping by both ID and period. The first date instance for each group should start at one, and if the date in the next row is less than 30 days from the first row, 1 to the streak counter. This should reset to 1 if the next row is 30 or more days.
This is my desired output:
ID | Period | Date | Counter |
---|---|---|---|
1 | 1 | 2020-09-05 | 1 |
1 | 2 | 2020-10-01 | 2 |
1 | 3 | 2020-10-30 | 3 |
2 | 2 | 2020-10-01 | 1 |
2 | 4 | 2020-12-05 | 1 |
2 | 6 | 2021-02-05 | 1 |
3 | 1 | 2020-09-05 | 1 |
I'm completely stuck on this - any help would be greatly appreciated. I'm using Microsoft SQL server.
EDIT: Actually using Microsoft SQL Server
CodePudding user response:
From Oracle 12c, you can use MATCH_RECOGNIZE
to perform a row-by-row comparison:
SELECT id,
period,
"DATE",
counter
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY "DATE"
MEASURES
COUNT(*) AS counter
ALL ROWS PER MATCH
PATTERN (any_date streak_date*)
DEFINE
streak_date AS "DATE" <= PREV("DATE") INTERVAL '30' DAY
-- AND PREV(period) < period
)
Which, for the sample data:
CREATE TABLE table_name (ID, Period, "DATE") AS
SELECT 1, 1, DATE '2020-09-05' FROM DUAL UNION ALL
SELECT 1, 2, DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 1, 3, DATE '2020-10-30' FROM DUAL UNION ALL
SELECT 2, 2, DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 2, 4, DATE '2020-12-05' FROM DUAL UNION ALL
SELECT 2, 6, DATE '2021-02-05' FROM DUAL UNION ALL
SELECT 3, 1, DATE '2020-09-05' FROM DUAL
Note: DATE
is a reserved word and it is bad practice to use it as a column name.
Outputs:
ID PERIOD DATE COUNTER 1 1 2020-09-05 00:00:00 1 1 2 2020-10-01 00:00:00 2 1 3 2020-10-30 00:00:00 3 2 2 2020-10-01 00:00:00 1 2 4 2020-12-05 00:00:00 1 2 6 2021-02-05 00:00:00 1 3 1 2020-09-05 00:00:00 1
db<>fiddle here