Home > Software engineering >  Count if a value appears in consecutive rows
Count if a value appears in consecutive rows

Time:12-22

I've got this table:

  ID           Date            Event
  ----------------------------------------
  123        2022-05-01         OCT
  123        2022-05-04         OCT
  123        2022-05-05         OCT
  123        2022-05-07         OCT
  123        2022-05-08         GRE
  123        2022-05-10         GRE
  123        2022-05-12         OCT
  123        2022-05-15         OCT

What I wish is to count the number of events appearing in a row, in order to check if the event 'OCT' has happened 4 times in a row. for example:

Desirable output:

  ID           Date            Event         Order_Event
  --------------------------------------------------------
  123        2022-05-01         OCT               1
  123        2022-05-04         OCT               2
  123        2022-05-05         OCT               3
  123        2022-05-07         OCT               4
  123        2022-05-08         GRE               1
  123        2022-05-10         GRE               2
  123        2022-05-12         OCT               1  
  123        2022-05-15         OCT               2

What I have tried is the row number() but it gives me that:

  ID           Date            Event         Order_Event
  --------------------------------------------------------
  123        2022-05-01         OCT               1
  123        2022-05-04         OCT               2
  123        2022-05-05         OCT               3
  123        2022-05-07         OCT               4
  123        2022-05-08         GRE               1
  123        2022-05-10         GRE               2
  123        2022-05-12         OCT               5  
  123        2022-05-15         OCT               6

you see? I don't want it to count the events 'OCT' in the 12th and 15th of may as five and six.. i want to see only if each event happened 4 times in a row (by date)

Thank you!

CodePudding user response:

Try this:

DECLARE @DataSource TABLE
(
    [ID] INT
   ,[Date] DATE
   ,[Event] VARCHAR(3)
);

INSERT INTO @DataSource ([ID], [Date], [Event])
VALUES (123, '2022-05-01', 'OCT')
      ,(123, '2022-05-04', 'OCT')
      ,(123, '2022-05-05', 'OCT')
      ,(123, '2022-05-07', 'OCT')
      ,(123, '2022-05-08', 'GRE')
      ,(123, '2022-05-10', 'GRE')
      ,(123, '2022-05-12', 'OCT')
      ,(123, '2022-05-15', 'OCT');

SELECT [ID], [Date], [Event]
      ,ROW_NUMBER() OVER (PARTITION BY grp ORDER BY [Date]) AS [ Order_Event]
FROM
(
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY [Date]) - ROW_NUMBER() OVER(ORDER BY [Event], [Date]) AS grp
    FROM @DataSource
) DS
ORDER BY [Date];

enter image description here

CodePudding user response:

One option here is to run two row numbers:

WITH sub AS
(SELECT id, date, event, 
  ROW_NUMBER() OVER(PARTITION BY id ORDER BY date)
    AS RowNr1,
  ROW_NUMBER() OVER (PARTITION BY id, event ORDER BY date) AS RowNr2
FROM yourtable)
SELECT id, date, event,
    ROW_NUMBER() OVER (PARTITION BY id, RowNr1 - RowNr2 ORDER BY date) AS Order_Event
FROM sub
ORDER BY
    id,
    date;

The idea is to build one row number per id and one per id and event, both sorted by date. Then we do the partition by the difference of those row numbers. It's difficult to explain in words only, better see the fiddle here:

db<>fiddle

There you can replicate the result of the subquery with the two different row numbers and replicate how it works.

Side note: I recommend to rename the column "date" because this is a SQL key word and also not really meaningful. Better use something like "eventDate".

CodePudding user response:

You can define groups for your events by implementing a count on when the data changes using lag, and then number these individual groups sequentially:

with groups as (
  select *,
    Sum(case when Event = prevEvent then 0 else 1 end) over(partition by Id order by date) gp
  from (
    select *, Lag(Event) over(partition by Id order by date) prevEvent
    from t
  )t
)
select Id, Date, Event, 
    Row_Number() over(partition by Id, gp order by date) Order_Event
from groups
order by Id, Date;

Demo Fiddle

  • Related