Home > Enterprise >  Oracle SQL: Generating a group id in a result set by country within a time boundary
Oracle SQL: Generating a group id in a result set by country within a time boundary

Time:12-14

I have the following result set ordered by ID, REGISTRATION_TS, COUNTRY

ID    CO REGISTRATION_TS    
----- -- -------------------
56053 CH 05/07/2022 20:57:47
56053 CH 05/07/2022 23:26:05
56053 CH 06/07/2022 03:40:18
56053 CH 06/07/2022 03:42:58
56053 DE 06/07/2022 07:50:21
56053 DE 12/07/2022 05:05:14
56053 DE 13/07/2022 12:43:06
56053 CH 26/07/2022 22:52:20
56053 CH 27/07/2022 04:05:14
56053 DE 27/07/2022 08:47:55
56053 DE 27/07/2022 15:34:32
86EBD SI 29/07/2022 18:05:11
86EBD SI 29/07/2022 18:13:21
86EBD AT 30/07/2022 07:35:15
86EBD DE 30/07/2022 07:35:15
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:46:16
86EBD AT 30/07/2022 07:46:16
86EBD SK 30/07/2022 13:14:45

And I would like to have an additional column grouping by country within a time period for an ID. In my case, for ID 56053, country CH between 05/07/2022 20:57:47 and 06/07/2022 03:42:58 should have a Group = 1, the next three rows for DE, should have Group = 2, the next two rows CH, Group should be 3. Once a new ID starts, the ID should be reset to 1.

The new result set should be:

ID    CO REGISTRATION_TS     GROUP
----- -- ------------------- -----
56053 CH 05/07/2022 20:57:47 1
56053 CH 05/07/2022 23:26:05 1
56053 CH 06/07/2022 03:40:18 1
56053 CH 06/07/2022 03:42:58 1
56053 DE 06/07/2022 07:50:21 2
56053 DE 12/07/2022 05:05:14 2
56053 DE 13/07/2022 12:43:06 2
56053 CH 26/07/2022 22:52:20 3
56053 CH 27/07/2022 04:05:14 3
56053 DE 27/07/2022 08:47:55 4
56053 DE 27/07/2022 15:34:32 4
86EBD SI 29/07/2022 18:05:11 1
86EBD SI 29/07/2022 18:13:21 1
86EBD AT 30/07/2022 07:35:15 2
86EBD DE 30/07/2022 07:35:15 3
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:46:16 4
86EBD AT 30/07/2022 07:46:16 4
86EBD SK 30/07/2022 13:14:45 5

I've been trying with analytical functions but I haven't hit the solution yet. My next option would be to code a PL/SQL block but I am sure this could be solved with pure SQL.

Oracle 19.0

CodePudding user response:

We can phrase this as a gaps and islands problem. One solution uses the difference in row numbers method:

WITH cte AS (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY REGISTRATION_TS) rn1,
                ROW_NUMBER() OVER (PARTITION BY ID, CO ORDER BY REGISTRATION_TS) rn2
    FROM yourTable t
),

SELECT ID, CO, REGISTRATION_TS,
       DENSE_RANK() OVER (PARTITION BY ID ORDER BY rn1 - rn2) "GROUP"
FROM cte
ORDER BY ID, REGISTRATION_TS;

CodePudding user response:

From Oracle 12, you can efficiently solve row-by-row pattern matching problems with MATCH_RECOGNIZE:

SELECT *
FROM   table_name
MATCH_RECOGNIZE (
  PARTITION BY id
  ORDER BY registration_ts
  MEASURES
    MATCH_NUMBER() AS mno
  ALL ROWS PER MATCH
  PATTERN (same_country )
  DEFINE same_country AS FIRST(country) = country
)

Which, for the sample data:

CREATE TABLE table_name (ID, country, REGISTRATION_TS) AS
SELECT '56053', 'CH', DATE '2022-07-05'   INTERVAL '20:57:47' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-05'   INTERVAL '23:26:05' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-06'   INTERVAL '03:40:18' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-06'   INTERVAL '03:42:58' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-06'   INTERVAL '07:50:21' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-12'   INTERVAL '05:05:14' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-13'   INTERVAL '12:43:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-26'   INTERVAL '22:52:20' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'CH', DATE '2022-07-27'   INTERVAL '04:05:14' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-27'   INTERVAL '08:47:55' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '56053', 'DE', DATE '2022-07-27'   INTERVAL '15:34:32' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SI', DATE '2022-07-29'   INTERVAL '18:05:11' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SI', DATE '2022-07-29'   INTERVAL '18:13:21' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30'   INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'DE', DATE '2022-07-30'   INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30'   INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30'   INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30'   INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'AT', DATE '2022-07-30'   INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
SELECT '86EBD', 'SK', DATE '2022-07-30'   INTERVAL '13:14:45' HOUR TO SECOND FROM DUAL;

Outputs:

ID REGISTRATION_TS MNO COUNTRY
56053 2022-07-05 20:57:47 1 CH
56053 2022-07-05 23:26:05 1 CH
56053 2022-07-06 03:40:18 1 CH
56053 2022-07-06 03:42:58 1 CH
56053 2022-07-06 07:50:21 2 DE
56053 2022-07-12 05:05:14 2 DE
56053 2022-07-13 12:43:06 2 DE
56053 2022-07-26 22:52:20 3 CH
56053 2022-07-27 04:05:14 3 CH
56053 2022-07-27 08:47:55 4 DE
56053 2022-07-27 15:34:32 4 DE
86EBD 2022-07-29 18:05:11 1 SI
86EBD 2022-07-29 18:13:21 1 SI
86EBD 2022-07-30 07:35:15 2 AT
86EBD 2022-07-30 07:35:15 3 DE
86EBD 2022-07-30 07:38:06 4 AT
86EBD 2022-07-30 07:38:06 4 AT
86EBD 2022-07-30 07:46:16 4 AT
86EBD 2022-07-30 07:46:16 4 AT
86EBD 2022-07-30 13:14:45 5 SK

fiddle

  • Related