Home > database >  Cumulative sum with condition in SQLite
Cumulative sum with condition in SQLite

Time:07-05

I have a dataset that looks like the one below and I need to know which songs were played in sequence the most (I am using SQLite).

SELECT endtime, trackname FROM streaming_history AS sh
JOIN albums AS a ON a.albumid = sh.albumid
JOIN tracks AS t ON t.trackid = sh.trackid
endtime trackname
2022-03-21 15:48:00 Walk
2022-03-21 15:50:00 Walk
2022-03-21 15:54:00 One Last Breath
2022-03-21 15:57:00 Breakout
2022-03-21 16:02:00 Outside
2022-03-21 16:07:00 Uprising
2022-03-21 16:12:00 Walk
2022-03-21 16:16:00 One Last Breath
2022-03-21 16:20:00 Breakout
2022-03-21 16:36:00 Outside
2022-03-22 00:10:00 Outside
2022-03-22 10:53:00 Outside

Since I have to know not which songs were played the most, but which song were played the most in sequence, a simple count with group by won't do. From the data extract above, Outside would be first place with three times (not four) and Walk second place with two times.

I managed to create another column flagging whether or not the last played song was the same as the current row song with LAG and CASE, but can't seem to be able to figure out how to create a column to sum using the new column's last row info.

CodePudding user response:

Use SUM() window function to get the groups of sequences of songs, by summing the flag that you created with LAG() and aggregate.
Then use RANK() window function to get the top sequenced song(s):

WITH cte AS (
  SELECT trackid, count
  FROM (
    SELECT MAX(trackid) trackid, 
           COUNT(*) count,
           RANK() OVER (ORDER BY COUNT(*) DESC) rnk
    FROM (
      SELECT *, SUM(flag) OVER (ORDER BY endtime) grp
      FROM (
        SELECT *, trackid <> LAG(trackid, 1, 0) OVER (ORDER BY endtime) flag
        FROM streaming_history
      )
    )
    GROUP BY grp
  )
  WHERE rnk = 1
)
SELECT t.trackname, c.count
FROM tracks t INNER JOIN cte c
ON c.trackid = t.trackid;

See the demo.

  • Related