Consider that I have a sqlite table called messages
with these columns:
| id | type | text | time |
---------------------------
id
is the primary key and is unique. Imagine that I have 5 rows in the following order (representing them as a JSON array for clarity):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581'
}
{
id: 'third',
type: 'new_item',
text: 'new socks',
time: '2022-02-15T01:49:25.581'
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581'
}
I want to query these messages such that consecutive rows with type of new_item
are represented as one, as well as the number of consecutive rows that are present for each unique new_item
in the final output. Concretely, I would like output that gives me the information captured below (doesn't have to be the same schema, this is just an example of what I want):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581',
numConsecutiveItems: 2
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581',
numConsecutiveItems: 1
}
Notice that the message with id of third
is not in the final output because it has type of new_item
and consecutively follows another message with type new_item
, and the message with id of second
has numConsecutiveItems
of 2
for the same reason. More so, the message with id of fifth
is present because it doesn't immediately follow another new_item
message, and its value of numConsecutiveItems
is 1
for the same reason. Can I achieve this with a single query, and ordered by the time
column? That will be my strong preference, but if not then ideally no more than 2 queries. Thank you!
CodePudding user response:
Use window functions to create the groups of consecutive types and count how many of them are 'new_item'
in each group:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY grp) count,
ROW_NUMBER() OVER (PARTITION BY grp ORDER BY time) rn
FROM (
SELECT *, SUM(flag) OVER (ORDER BY time) grp
FROM (
SELECT *, (type <> LAG(type, 1, '') OVER (ORDER BY time)) flag
FROM tablename
)
)
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
For versions of SQLite that do not support window functions, use aggregations and correlated subqueries to simulate the window functions:
WITH
prev_types AS (
SELECT t1.*, t1.type <> COALESCE(t2.type, '') flag, MAX(t2.time) max_time
FROM tablename t1 LEFT JOIN tablename t2
ON t2.time < t1.time
GROUP BY t1.id
),
sum_flags AS (
SELECT pt1.*, SUM(pt2.flag) grp
FROM prev_types pt1 INNER JOIN prev_types pt2
ON pt2.time <= pt1.time
GROUP BY pt1.id
),
cte AS (
SELECT sf1.*,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp) count,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp AND sf2.time <= sf1.time) rn
FROM sum_flags sf1
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
See the demo.