Home > Mobile >  T-SQL query getting the max of a sub element
T-SQL query getting the max of a sub element

Time:11-12

Consider a table like this:

Category Subcategory Item
Foo Apple i1
Foo Apple i2
Foo Apple i3
Foo Pear i4
Foo Pear i5
Bar Blackberry i6
Bar Blueberry i7
Bar Blueberry i8

I want to, for each category, get the subcategory with the highest count of items. I don't care about the identity of the items (or even their count). So, I'd expect the final return to be

Category Subcategory
Foo Apple
Bar Blueberry

I've tried

WITH pool AS (
    SELECT
        category,
        subcategory,
        COUNT(item) AS "itemCount"
    FROM table
    GROUP BY category, subcategory
),
maxItems AS (
    SELECT
        category,
        MAX(subcategory), -- In real life, this is a numeric column
    FROM pool
    GROUP BY category
    HAVING itemCount = MAX(itemCount)
)
-- and final query here filtered on the category -> subcategory mapping from above

but the HAVING statement errors with

is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.

Of course it's not in the group by. I don't want to group by the max count, I want to filter by it.

I can make it work with a subquery in maxItems, changing it to

maxItems AS (
    SELECT
        category,
        MAX(subcategory), -- In real life, this is a numeric column
    FROM pool
    JOIN (
        SELECT
            subcategory,
            MAX(itemCount) AS "itemCount"
        FROM pool
        GROUP BY subcategory
    ) AS "maxFilter"
        ON rmCounts.subcategory = maxFilter.subcategory
        AND maxFilter.itemCount = rmCounts.itemCount
    GROUP BY category
)

but I really feel like it'd be more elegant and more clear if HAVING worked, and I don't understand why it doesn't.

CodePudding user response:

You can do it with FIRST_VALUE() window function:

SELECT DISTINCT Category,
       FIRST_VALUE(Subcategory) OVER (PARTITION BY Category ORDER BY COUNT(*) DESC) Subcategory
FROM tablename 
GROUP BY Category, Subcategory;

See the demo.

CodePudding user response:

here is one way, which handles ties as well :

select * from (
   select category,Subcategory,rank() over (partition by category order by count(*) desc) rn 
   from tablename
   group by category,Subcategory
)t where rn = 1

db<>fiddle here

CodePudding user response:

This gets the highest of each subcategory, and returns two subcategories if the counts tie:

select a.category, a.subcategory, itemcounts.total
from table a
cross apply ( select top 1 b.subcategory, count(b.item) as total
            from table b 
            where b.category = a.category
            group by b.subcategory
            order by count(b.item) desc) itemcounts
group by a.category, a.subcategory, itemcounts.total
having count(a.item) = itemcounts.total
  • Related