Home > Enterprise >  Group a sequence of lines [SQL]
Group a sequence of lines [SQL]

Time:03-09

Is there a way to group a sequence of rows in SQL (MySQL 5.1.73).

Let me explain, I have a query that gives this:

hour start_date end_date
10 2022-02-01 10:11:18 2022-02-01 10:50:18
11 2022-02-01 11:30:31 2022-02-01 11:38:12
13 2022-02-17 13:55:09 2022-02-17 13:58:38
14 2022-02-17 14:51:09 2022-02-17 14:57:59

And I would like to convert it to this:

hour start_date end_date
10 2022-02-01 10:11:18 2022-02-01 11:38:12
13 2022-02-17 13:55:09 2022-02-17 14:57:59

Indeed, I would like to group all the lines whose hours follow each other.

My request is a grouping in hours, like this :

SELECT hour( date ) as hour, MIN(date) as start_date , MAX(date) as end_date
FROM test_tbl
GROUP BY hour( date ) ,  date( date ) 
order by date, hour( date ) ;

But after doing this query, I would like to group the lines whose hours follow each other (10,11 => 10)...

CodePudding user response:

EDIT: the following answer only works with MySQL version 8

with tbl_by_hour as (
    SELECT hour( date ) as hour, MIN(date) as start_date , MAX(date) as end_date
    FROM test_tbl
    GROUP BY hour( date ) ,  date( date ) 
    order by date, hour( date )
)
select 
    min(hour) as hour, 
    min(start_date) as start_date, 
    max(end_date) as end_date 
    from (
      select tab1.*,
      sum(case when prev_hour is null or prev_hour = hour - 1 then 0 else 1 end) over(order by hour) grp
      from (
        select hour, start_date, end_date, lag(hour) over(order by hour) prev_hour from tbl_by_hour
      ) as tab1
    ) as tab2
group by grp

CodePudding user response:

You can probably do something like this:

SELECT MIN(hours), dates, MIN(start_date), MAX(end_date), tn
FROM
(SELECT *,
       CEIL(rownum/5) AS tn
 FROM
(SELECT *,
       CASE WHEN dates=@dt
            AND hours=@hr 1
            THEN @rn := @rn 1
            WHEN dates=@dt
            AND hours > @hr 1
            THEN @rn := @rn 20
        ELSE @rn := 1
        END AS rownum,
       @dt := dates,
       @hr := hours
FROM       
 (SELECT hour(date) as hours, date(date) dates, 
        MIN(date) as start_date , MAX(date) as end_date
   FROM test_tbl t 
  GROUP BY dates, hours) v 
  CROSS JOIN (SELECT @rn := 0, @dt := NULL, @hr := 0) r
ORDER BY dates, hours) s
) w
GROUP BY dates, tn;
  1. I took your original query as base then made it as subquery.
  2. Then I CROSS JOIN with a subquery of variables where I'm attempting to generate a custom row numbering. The conditions of the row number are:
  • If it's on the same date and the next hour increment from previous is 1 then continue the numbering.
  • If it's on the same date and the next hour increment from previous more than 1 then pick-up the last number and increment it by 20.
  • Repeat the row numbering sequence if the date is different.
  1. After generating the row numbering, I convert to subquery then divide the row numbering by 5 and use ceiling (CIEL) function to somehow make them the same, effectively identifying (assuming) these rows with same CIEL(rownum/5) result as one group - this is where I felt it's not really convincing but it works anyhow.

  2. Lastly, I convert that to a subquery again and did the whole MIN(hours), dates, MIN(start_date), MAX(end_date), tn with GROUP BY dates, tn.

It's not a convincing solution because the final operation (generating of the tn column) is based on creativity and not something certain. I usually prefer a solution that covers all the possible scenarios with something concrete rather than creative. However, I did some extensive tests on the current query with more data variation and so far it's returning good results. Also, I do notice that you said your MySQL version is 5.1 so, I'm not really sure if this particular operation will work. Version 5.5 is probably the lowest version of MySQL fiddle that is available online.

Here's a demo fiddle

  • Related