I have a dataset that looks like the one below and I need to know which songs were played in sequence the most (I am using SQLite).
SELECT endtime, trackname FROM streaming_history AS sh
JOIN albums AS a ON a.albumid = sh.albumid
JOIN tracks AS t ON t.trackid = sh.trackid
endtime | trackname |
---|---|
2022-03-21 15:48:00 | Walk |
2022-03-21 15:50:00 | Walk |
2022-03-21 15:54:00 | One Last Breath |
2022-03-21 15:57:00 | Breakout |
2022-03-21 16:02:00 | Outside |
2022-03-21 16:07:00 | Uprising |
2022-03-21 16:12:00 | Walk |
2022-03-21 16:16:00 | One Last Breath |
2022-03-21 16:20:00 | Breakout |
2022-03-21 16:36:00 | Outside |
2022-03-22 00:10:00 | Outside |
2022-03-22 10:53:00 | Outside |
Since I have to know not which songs were played the most, but which song were played the most in sequence, a simple count with group by won't do. From the data extract above, Outside would be first place with three times (not four) and Walk second place with two times.
I managed to create another column flagging whether or not the last played song was the same as the current row song with LAG and CASE, but can't seem to be able to figure out how to create a column to sum using the new column's last row info.
CodePudding user response:
Use SUM()
window function to get the groups of sequences of songs, by summing the flag that you created with LAG()
and aggregate.
Then use RANK()
window function to get the top sequenced song(s):
WITH cte AS (
SELECT trackid, count
FROM (
SELECT MAX(trackid) trackid,
COUNT(*) count,
RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM (
SELECT *, SUM(flag) OVER (ORDER BY endtime) grp
FROM (
SELECT *, trackid <> LAG(trackid, 1, 0) OVER (ORDER BY endtime) flag
FROM streaming_history
)
)
GROUP BY grp
)
WHERE rnk = 1
)
SELECT t.trackname, c.count
FROM tracks t INNER JOIN cte c
ON c.trackid = t.trackid;
See the demo.