Home > database >  query data for high-frequency rows
query data for high-frequency rows

Time:04-29

I'm a novice to SQL. I'm trying to group data by app version, and only show the two most popular app versions per operating system.

Let's say I have the following columns:

  • operating_system - varchar
  • app_version - varchar
  • start_time - float
  • timestamp - timestamp

I can do the following to get the average start time by OS, app version, and day, but I only want the top two or three app versions in that time range.

SELECT operation_system, app_version, DATE(timestamp) as date, AVG(start_time) as average_start_time
FROM ...
WHERE timestamp > timestamp_sub(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY operating_system, app_version, date
ORDER BY operating_system, app_version DESC, date

but I only want the top two app versions per OS where there may be dozens. And using LIMIT, I believe, would only cap the final total. Is there a good way to only get the two or three most frequent per app version?

Here's an example table without the timestamp:

operating_system app_version start_time
Windows 1.23 7
Windows 1.22 9
OS X 7.0 4
Windows 1.22 16
Windows 1.23 8
Windows 1.21 4

And here's an example response getting the top-two (by frequency) app versions per OS:

operating_system app_version average_start_time
Windows 1.23 7.5
Windows 1.22 12.5
OS X 7.0 4

In this example, the row of Windows 1.21 is omitted, but because it had fewer rows than Windows 1.22 or 1.23.

CodePudding user response:

Use below

select * except(cnt) from (
  select operating_system, app_version, avg(start_time) average_start_time, count(*) cnt
  from your_table
  group by operating_system, app_version
)
qualify 2 >= row_number() over(partition by operating_system order by cnt desc)           

if applied to sample data in your question - output is

enter image description here

  • Related