Home > Blockchain >  Get the top N rows by row count in GROUP BY
Get the top N rows by row count in GROUP BY

Time:10-08

I'm querying a records table to find which users are my top record creators for certain record types. The basic starting point of my query looks something like this:

SELECT recordtype, createdby, COUNT(*)
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
ORDER BY recordtype, createdby DESC

But there are many users who have created records - I want to narrow this down further.

I added HAVING COUNT(*) > ..., but some record types only have a few records, while others have hundreds. If I do HAVING COUNT(*) > 10, I won't see that all 9 records of type "XYZ" were created by the same person, but I will have to scroll through every person that's created only 15, 30, 50, etc. of the 3,500 records of type "ABC."

I only want the top 5, 10, or so creators for each record type.

I've found a few questions that address the "select top N in group" part of the question, but I can't figure out how to apply them to what I need. The answers I could find are in cases where the "rank by" column is a value stored in the table, not an aggregate.

(Example: "what are the top cities in each country by population?", with data that looks like this:)

Country       City     Population
United States New York 123456789
United States Chicago  123456789
France        Paris    123456789

I don't know how to apply the methods I've seen used to answer that (row_number(), mostly) to get the top N by COUNT(*).

CodePudding user response:

here is one way , to get top 10 rows in each group:

select * from(
    select *, row_number() over (partition by recordtype order by cnt desc) rn
    from (
       SELECT recordtype, createdby, COUNT(*) cnt
       FROM recordtable
       WHERE recordtype in (...)
       GROUP BY recordtype, createdby
    )t 
)t where rn <= 10 

CodePudding user response:

If i understand well you want to get the top N records with biggest count. You can achieve this with a subquery like this (I suppose you are using MySQL or PostGRESQL or db2, in other DB engines the limit and offset may differ, as for example in sqlserver that is achieved with select top n * from...

SELECT A.recordtype, A.createdby, A.total FROM (
    SELECT recordtype, createdby, COUNT(*) as total
    FROM recordtable
    WHERE recordtype in (...)
    GROUP BY recordtype, createdby
) AS A ORDER BY recordtype, createdby, total DESC
LIMIT 10 OFFSET 0

Limit is the number of records you want in the results page, and offset is the number of records to skip before taking the result page.

If you use sqlserver it may look like this (there is also a way to apply an offset, you can take a look here SQL Server OFFSET and LIMIT)

SELECT TOP 10 A.recordtype, A.createdby, A.total FROM (
    SELECT recordtype, createdby, COUNT(*) as total
    FROM recordtable
    WHERE recordtype in (...)
    GROUP BY recordtype, createdby
) AS A ORDER BY recordtype, createdby, total DESC

For a grouped result then you can take a look at this post http://www.silota.com/docs/recipes/sql-top-n-group.html

So to take first 10 records in groups not only the first i mix this answer, with the link below and the approach of @eshirvana

SELECT * FROM (
    SELECT *, row_number() OVER (PARTITION recordtype BY ORDER BY total DESC) rn
    FROM (
        SELECT recordtype, createdby, COUNT(*) as total
        FROM recordtable
        WHERE recordtype in (...)
        GROUP BY recordtype, createdby
    ) t
) t WHERE total <= 10 
  • Related