Home > OS >  Group by rows which are in sequence
Group by rows which are in sequence

Time:11-26

Consider I have a table like this

PASSENGER  CITY      DATE
43         NEW YORK  1-Jan-21
44         CHICAGO   4-Jan-21
43         NEW YORK  2-Jan-21
43         NEW YORK  3-Jan-21
44         ROME      5-Jan-21
43         LONDON    4-Jan-21
44         CHICAGO   6-Jan-21
44         CHICAGO   7-Jan-21

How would I group Passenger and City column in sequence to get a result like below?

PASSENGER  CITY      COUNT
43         NEW YORK  3
44         CHICAGO   1
44         ROME      1
43         LONDON    1
44         CHICAGO   2

CodePudding user response:

From Oracle 12, you can use MATCH_RECOGNIZE:

SELECT *
FROM   table_name
MATCH_RECOGNIZE (
  PARTITION BY passenger
  ORDER     BY "DATE"
  MEASURES
    FIRST(city) AS city,
    COUNT(*)    AS count
  PATTERN (same_city )
  DEFINE
    same_city AS FIRST(city) = city
);

Which, for the sample data:

CREATE TABLE table_name (PASSENGER, CITY, "DATE") AS
SELECT 43, 'NEW YORK',  DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK',  DATE '2021-01-02' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK',  DATE '2021-01-03' FROM DUAL UNION ALL
SELECT 44, 'ROME',      DATE '2021-01-05' FROM DUAL UNION ALL
SELECT 43, 'LONDON',    DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-06' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO',   DATE '2021-01-07' FROM DUAL

Outputs:

PASSENGER CITY COUNT
43 NEW YORK 3
43 LONDON 1
44 CHICAGO 1
44 ROME 1
44 CHICAGO 2

If you have ordered the input result set (note: tables should be considered to be unordered) and want to maintain the order then:

SELECT *
FROM   (SELECT t.*, ROWNUM AS rn FROM table_name t)
MATCH_RECOGNIZE (
  PARTITION BY passenger
  ORDER     BY RN
  MEASURES
    FIRST(rn)     AS rn,
    FIRST("DATE") AS "DATE",
    FIRST(city)   AS city,
    COUNT(*)      AS count
  PATTERN (same_city )
  DEFINE
    same_city AS FIRST(city) = city
)
ORDER BY rn

Outputs:

PASSENGER RN DATE CITY COUNT
43 1 01-JAN-21 NEW YORK 3
44 2 04-JAN-21 CHICAGO 1
44 5 05-JAN-21 ROME 1
43 6 04-JAN-21 LONDON 1
44 7 06-JAN-21 CHICAGO 2

db<>fiddle here

CodePudding user response:

One way to deal with such a gaps-and-islands problem is to calculate a ranking for the gaps.

Then group also on that ranking.

SELECT PASSENGER, CITY
, COUNT(*) AS "Count" 
-- , MIN("DATE") AS StartDate
-- , MAX("DATE") AS EndDate
FROM (
  SELECT q1.*
  , SUM(gap) OVER (PARTITION BY PASSENGER ORDER BY "DATE") as Rnk
  FROM (
    SELECT PASSENGER, CITY, "DATE"
    , CASE
      WHEN 1 = TRUNC("DATE")
             - TRUNC(LAG("DATE") 
                     OVER (PARTITION BY PASSENGER, CITY ORDER BY "DATE")) 
      THEN 0 ELSE 1 END as gap
    FROM table_name t
  ) q1
) q2
GROUP BY PASSENGER, CITY, Rnk
ORDER BY MIN("DATE"), PASSENGER
PASSENGER CITY Count
43 NEW YORK 3
43 LONDON 1
44 CHICAGO 1
44 ROME 1
44 CHICAGO 2

db<>fiddle here

  • Related