I have the following result set ordered by ID, REGISTRATION_TS, COUNTRY
ID CO REGISTRATION_TS
----- -- -------------------
56053 CH 05/07/2022 20:57:47
56053 CH 05/07/2022 23:26:05
56053 CH 06/07/2022 03:40:18
56053 CH 06/07/2022 03:42:58
56053 DE 06/07/2022 07:50:21
56053 DE 12/07/2022 05:05:14
56053 DE 13/07/2022 12:43:06
56053 CH 26/07/2022 22:52:20
56053 CH 27/07/2022 04:05:14
56053 DE 27/07/2022 08:47:55
56053 DE 27/07/2022 15:34:32
86EBD SI 29/07/2022 18:05:11
86EBD SI 29/07/2022 18:13:21
86EBD AT 30/07/2022 07:35:15
86EBD DE 30/07/2022 07:35:15
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:46:16
86EBD AT 30/07/2022 07:46:16
86EBD SK 30/07/2022 13:14:45
And I would like to have an additional column grouping by country within a time period for an ID. In my case, for ID 56053, country CH between 05/07/2022 20:57:47 and 06/07/2022 03:42:58 should have a Group = 1, the next three rows for DE, should have Group = 2, the next two rows CH, Group should be 3. Once a new ID starts, the ID should be reset to 1.
The new result set should be:
ID CO REGISTRATION_TS GROUP
----- -- ------------------- -----
56053 CH 05/07/2022 20:57:47 1
56053 CH 05/07/2022 23:26:05 1
56053 CH 06/07/2022 03:40:18 1
56053 CH 06/07/2022 03:42:58 1
56053 DE 06/07/2022 07:50:21 2
56053 DE 12/07/2022 05:05:14 2
56053 DE 13/07/2022 12:43:06 2
56053 CH 26/07/2022 22:52:20 3
56053 CH 27/07/2022 04:05:14 3
56053 DE 27/07/2022 08:47:55 4
56053 DE 27/07/2022 15:34:32 4
86EBD SI 29/07/2022 18:05:11 1
86EBD SI 29/07/2022 18:13:21 1
86EBD AT 30/07/2022 07:35:15 2
86EBD DE 30/07/2022 07:35:15 3
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:46:16 4
86EBD AT 30/07/2022 07:46:16 4
86EBD SK 30/07/2022 13:14:45 5
I've been trying with analytical functions but I haven't hit the solution yet. My next option would be to code a PL/SQL block but I am sure this could be solved with pure SQL.
Oracle 19.0
CodePudding user response:
We can phrase this as a gaps and islands problem. One solution uses the difference in row numbers method:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY REGISTRATION_TS) rn1,
ROW_NUMBER() OVER (PARTITION BY ID, CO ORDER BY REGISTRATION_TS) rn2
FROM yourTable t
),
SELECT ID, CO, REGISTRATION_TS,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY rn1 - rn2) "GROUP"
FROM cte
ORDER BY ID, REGISTRATION_TS;
CodePudding user response:
From Oracle 12, you can efficiently solve row-by-row pattern matching problems with MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY registration_ts
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN (same_country )
DEFINE same_country AS FIRST(country) = country
)
Which, for the sample data:
CREATE TABLE table_name (ID, country, REGISTRATION_TS) AS
SELECT '56053', 'CH', DATE '2022-07-05' INTERVAL '20:57:47' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-05' INTERVAL '23:26:05' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-06' INTERVAL '03:40:18' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-06' INTERVAL '03:42:58' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-06' INTERVAL '07:50:21' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-12' INTERVAL '05:05:14' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-13' INTERVAL '12:43:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-26' INTERVAL '22:52:20' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-27' INTERVAL '04:05:14' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-27' INTERVAL '08:47:55' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-27' INTERVAL '15:34:32' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SI', DATE '2022-07-29' INTERVAL '18:05:11' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SI', DATE '2022-07-29' INTERVAL '18:13:21' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30' INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'DE', DATE '2022-07-30' INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30' INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30' INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30' INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30' INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SK', DATE '2022-07-30' INTERVAL '13:14:45' HOUR TO SECOND FROM DUAL;
Outputs:
ID | REGISTRATION_TS | MNO | COUNTRY |
---|---|---|---|
56053 | 2022-07-05 20:57:47 | 1 | CH |
56053 | 2022-07-05 23:26:05 | 1 | CH |
56053 | 2022-07-06 03:40:18 | 1 | CH |
56053 | 2022-07-06 03:42:58 | 1 | CH |
56053 | 2022-07-06 07:50:21 | 2 | DE |
56053 | 2022-07-12 05:05:14 | 2 | DE |
56053 | 2022-07-13 12:43:06 | 2 | DE |
56053 | 2022-07-26 22:52:20 | 3 | CH |
56053 | 2022-07-27 04:05:14 | 3 | CH |
56053 | 2022-07-27 08:47:55 | 4 | DE |
56053 | 2022-07-27 15:34:32 | 4 | DE |
86EBD | 2022-07-29 18:05:11 | 1 | SI |
86EBD | 2022-07-29 18:13:21 | 1 | SI |
86EBD | 2022-07-30 07:35:15 | 2 | AT |
86EBD | 2022-07-30 07:35:15 | 3 | DE |
86EBD | 2022-07-30 07:38:06 | 4 | AT |
86EBD | 2022-07-30 07:38:06 | 4 | AT |
86EBD | 2022-07-30 07:46:16 | 4 | AT |
86EBD | 2022-07-30 07:46:16 | 4 | AT |
86EBD | 2022-07-30 13:14:45 | 5 | SK |