Grouping with "checkpoints" in Oracle database-CodePudding

I have the following data as a DATE and MIXED_NAME. The MIXED_NAME has records with unique values like "Unique-Value-123" and with typical values all of which are "Typical-Value". The raw data is below. I want them group by DATE, MIXED_NAME, but in the case that in the same day I have "Typical-Value", "Unique-Value-123" and "Typical-Value", I need them to be split into groups by "Unique-Value-123" which plays like a checkpoint. So I want to get a group before and another group after.

I use Oracle DB. So far I came to the idea of adding rownum of the previous "Unique-Value-123" to all of the typicals to get a bunch of "Typical-Value-123" which conveniently will group by. But I can't find a function that in single action will perform case LAG(SUBSTR(MIXED_NAME, INSTR(MIXED_NAME, '_-_') 1)) over (order by DATE) action for a current row based on a result of this case for a previous row.

How to organize such grouping?

UPD: Replaced the image with tables

Raw data:

DATE	MIXED_NAME
December 14, 2013 at 6:34:00 PM	Typical-Value
December 14, 2013 at 6:35:00 PM	Typical-Value
December 15, 2013 at 9:00:00 PM	Typical-Value
December 15, 2013 at 10:00:59 PM	Unique-Value-123
December 25, 2013 at 1:31:13 AM	Typical-Value
December 28, 2013 at 1:40:42 AM	Typical-Value
December 30, 2013 at 4:41:41 PM	Typical-Value
December 31, 2013 at 1:28:05 PM	Typical-Value
December 31, 2013 at 11:00:00 PM	Typical-Value
December 31, 2013 at 11:59:59 PM	Unique-Value-122
January 1, 2014 at 1:00:47 AM	Typical-Value
January 1, 2014 at 1:02:05 AM	Typical-Value
January 1, 2014 at 1:02:05 AM	Typical-Value
January 1, 2014 at 1:32:47 AM	Typical-Value

Expected result ("Typical-Value"-s are grouped):

DATE	MIXED_NAME
December 14, 2013 at 6:34:00 PM	Typical-Value
December 15, 2013 at 10:00:59 PM	Unique-Value-123
December 15, 2013 at 11:00:00 PM	Typical-Value
December 31, 2013 at 11:59:59 PM	Unique-Value-122
January 1, 2014 at 1:00:47 AM	Typical-Value

CodePudding user response：

Maybe there is a more clever way using analytical functions, but this solution is fairly simple as well.

It assigns a grouping number, by counting how many non-typical-value rows you have before the current row. And then it groups by that grouping number.

with
  VW_GROUP_NUMBER as (
    select
      v.*,
      ( select count(*) 
        from YOURTABLE r 
        where r.MIXED_NAME <> 'Typical-Value' 
          and r."DATE" <= v."DATE"
      ) as GROUP_NUMBER
    from
      YOURTABLE v)
select
  max(gn."DATE") as GROUPDATE,
  gn.MIXED_NAME
from
  VW_GROUP_NUMBER gn
group by
  gn.GROUP_NUMBER,         
  gn.MIXED_NAME
order by
  1;

CodePudding user response：

You can do this using the tabibitosan method, like so:

WITH your_table AS (SELECT to_date('14/12/2013 18:34:00', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('14/12/2013 18:35:00', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('15/12/2013 21:00:00', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('15/12/2013 22:00:59', 'dd/mm/yyyy hh24:mi:ss') dt, 'Unique-Value-123' mixed_name FROM dual UNION ALL
                    SELECT to_date('25/12/2013 01:31:13', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('28/12/2013 01:40:42', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('30/12/2013 16:41:41', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('31/12/2013 13:28:05', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('31/12/2013 23:00:00', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('31/12/2013 23:59:59', 'dd/mm/yyyy hh24:mi:ss') dt, 'Unique-Value-122' mixed_name FROM dual UNION ALL
                    SELECT to_date('01/01/2014 01:00:47', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('01/01/2014 01:02:05', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('01/01/2014 01:02:05', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('01/01/2014 01:32:47', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('02/01/2014 01:40:52', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual UNION ALL
                    SELECT to_date('02/01/2014 01:40:52', 'dd/mm/yyyy hh24:mi:ss') dt, 'Unique-Value-124' mixed_name FROM dual UNION ALL
                    SELECT to_date('02/01/2014 01:45:13', 'dd/mm/yyyy hh24:mi:ss') dt, 'Typical-Value' mixed_name FROM dual),
    tabibitosan AS (SELECT dt,
                           mixed_name,
                           row_number() OVER (ORDER BY dt, CASE WHEN mixed_name = 'Typical-Value' THEN 1 ELSE 0 END) - row_number() OVER (PARTITION BY mixed_name ORDER BY dt, CASE WHEN mixed_name = 'Typical-Value' THEN 1 ELSE 0 END) grp
                    FROM   your_table)
SELECT to_char(MIN(dt), 'fmMonth dd, yyyy "at" hh:mi:ss AM') dt,
       mixed_name
FROM   tabibitosan
GROUP BY mixed_name,
         grp
ORDER BY dt;

DT                                MIXED_NAME
--------------------------------- ----------------
December 14, 2013 at 6:34:0 PM    Typical-Value
December 15, 2013 at 10:0:59 PM   Unique-Value-123
December 25, 2013 at 1:31:13 AM   Typical-Value
December 31, 2013 at 11:59:59 PM  Unique-Value-122
January 1, 2014 at 1:0:47 AM      Typical-Value
January 2, 2014 at 1:40:52 AM     Unique-Value-124
January 2, 2014 at 1:40:52 AM     Typical-Value

Tabibitosan works by numbering the rows in the main ordered set and again in the ordered subsets, and then comparing the row numbers of each subset to the main set.

Rows in the subset that are consecutive will have the same difference, which is represented by the grp column in the above query. When the difference changes, you know a new group has started, and then it's simply a matter of finding the minimum value to find the start value.

You'll note that I've taken into account the fact that rows can have the same date and time - if that happens, I've assumed that the Typical-Value row starts a new group as opposed to being the end of the previous group. If that's not the case, then change the ELSE 0 to ELSE 2 in the ORDER BY clause of the row_number() analytic fuctions.