I'm a novice to SQL. I'm trying to group data by app version, and only show the two most popular app versions per operating system.
Let's say I have the following columns:
- operating_system - varchar
- app_version - varchar
- start_time - float
- timestamp - timestamp
I can do the following to get the average start time by OS, app version, and day, but I only want the top two or three app versions in that time range.
SELECT operation_system, app_version, DATE(timestamp) as date, AVG(start_time) as average_start_time
FROM ...
WHERE timestamp > timestamp_sub(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY operating_system, app_version, date
ORDER BY operating_system, app_version DESC, date
but I only want the top two app versions per OS where there may be dozens. And using LIMIT
, I believe, would only cap the final total. Is there a good way to only get the two or three most frequent per app version?
Here's an example table without the timestamp:
operating_system | app_version | start_time |
---|---|---|
Windows | 1.23 | 7 |
Windows | 1.22 | 9 |
OS X | 7.0 | 4 |
Windows | 1.22 | 16 |
Windows | 1.23 | 8 |
Windows | 1.21 | 4 |
And here's an example response getting the top-two (by frequency) app versions per OS:
operating_system | app_version | average_start_time |
---|---|---|
Windows | 1.23 | 7.5 |
Windows | 1.22 | 12.5 |
OS X | 7.0 | 4 |
In this example, the row of Windows 1.21 is omitted, but because it had fewer rows than Windows 1.22 or 1.23.
CodePudding user response:
Use below
select * except(cnt) from (
select operating_system, app_version, avg(start_time) average_start_time, count(*) cnt
from your_table
group by operating_system, app_version
)
qualify 2 >= row_number() over(partition by operating_system order by cnt desc)
if applied to sample data in your question - output is