Home > Software design >  sqlite count the number of consecutive matching rows while excluding them
sqlite count the number of consecutive matching rows while excluding them

Time:02-18

Consider that I have a sqlite table called messages with these columns:

| id | type | text | time |
---------------------------

id is the primary key and is unique. Imagine that I have 5 rows in the following order (representing them as a JSON array for clarity):

[
  {
    id: 'first',
    type: 'random',
    text: 'hey there',
    time: '2022-02-15T01:47:25.581'
  },
  {
    id: 'second',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:48:25.581'
  }
  {
    id: 'third',
    type: 'new_item',
    text: 'new socks',
    time: '2022-02-15T01:49:25.581'
  }
  {
    id: 'fourth',
    type: 'random',
    text: 'what time is it',
    time: '2022-02-15T01:50:25.581'
  },
  {
    id: 'fifth',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:51:25.581'
  }

I want to query these messages such that consecutive rows with type of new_item are represented as one, as well as the number of consecutive rows that are present for each unique new_item in the final output. Concretely, I would like output that gives me the information captured below (doesn't have to be the same schema, this is just an example of what I want):

[
  {
    id: 'first',
    type: 'random',
    text: 'hey there',
    time: '2022-02-15T01:47:25.581'
  },
  {
    id: 'second',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:48:25.581',
    numConsecutiveItems: 2
  }
  {
    id: 'fourth',
    type: 'random',
    text: 'what time is it',
    time: '2022-02-15T01:50:25.581'
  },
  {
    id: 'fifth',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:51:25.581',
    numConsecutiveItems: 1
  }

Notice that the message with id of third is not in the final output because it has type of new_item and consecutively follows another message with type new_item, and the message with id of second has numConsecutiveItems of 2 for the same reason. More so, the message with id of fifth is present because it doesn't immediately follow another new_item message, and its value of numConsecutiveItems is 1 for the same reason. Can I achieve this with a single query, and ordered by the time column? That will be my strong preference, but if not then ideally no more than 2 queries. Thank you!

CodePudding user response:

Use window functions to create the groups of consecutive types and count how many of them are 'new_item' in each group:

WITH cte AS (
  SELECT *, 
         COUNT(*) OVER (PARTITION BY grp) count,
         ROW_NUMBER() OVER (PARTITION BY grp ORDER BY time) rn 
  FROM (       
    SELECT *, SUM(flag) OVER (ORDER BY time) grp
    FROM (
      SELECT *, (type <> LAG(type, 1, '') OVER (ORDER BY time)) flag
      FROM tablename
    )
  )  
)
SELECT id, type, text, time,
       CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;

For versions of SQLite that do not support window functions, use aggregations and correlated subqueries to simulate the window functions:

WITH 
  prev_types AS (
    SELECT t1.*, t1.type <> COALESCE(t2.type, '') flag, MAX(t2.time) max_time 
    FROM tablename t1 LEFT JOIN tablename t2
    ON t2.time < t1.time
    GROUP BY t1.id
  ),
  sum_flags AS (
    SELECT pt1.*, SUM(pt2.flag) grp
    FROM prev_types pt1 INNER JOIN prev_types pt2
    ON pt2.time <= pt1.time
    GROUP BY pt1.id
  ),
  cte AS (
    SELECT sf1.*, 
           (SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp) count,
           (SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp AND sf2.time <= sf1.time) rn
    FROM sum_flags sf1
  )
SELECT id, type, text, time,
       CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;

See the demo.

  • Related