Consider a table like this:
Category | Subcategory | Item |
---|---|---|
Foo | Apple | i1 |
Foo | Apple | i2 |
Foo | Apple | i3 |
Foo | Pear | i4 |
Foo | Pear | i5 |
Bar | Blackberry | i6 |
Bar | Blueberry | i7 |
Bar | Blueberry | i8 |
I want to, for each category
, get the subcategory
with the highest count of item
s. I don't care about the identity of the items (or even their count). So, I'd expect the final return to be
Category | Subcategory |
---|---|
Foo | Apple |
Bar | Blueberry |
I've tried
WITH pool AS (
SELECT
category,
subcategory,
COUNT(item) AS "itemCount"
FROM table
GROUP BY category, subcategory
),
maxItems AS (
SELECT
category,
MAX(subcategory), -- In real life, this is a numeric column
FROM pool
GROUP BY category
HAVING itemCount = MAX(itemCount)
)
-- and final query here filtered on the category -> subcategory mapping from above
but the HAVING statement errors with
is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
Of course it's not in the group by. I don't want to group by the max count, I want to filter by it.
I can make it work with a subquery in maxItems, changing it to
maxItems AS (
SELECT
category,
MAX(subcategory), -- In real life, this is a numeric column
FROM pool
JOIN (
SELECT
subcategory,
MAX(itemCount) AS "itemCount"
FROM pool
GROUP BY subcategory
) AS "maxFilter"
ON rmCounts.subcategory = maxFilter.subcategory
AND maxFilter.itemCount = rmCounts.itemCount
GROUP BY category
)
but I really feel like it'd be more elegant and more clear if HAVING
worked, and I don't understand why it doesn't.
CodePudding user response:
You can do it with FIRST_VALUE()
window function:
SELECT DISTINCT Category,
FIRST_VALUE(Subcategory) OVER (PARTITION BY Category ORDER BY COUNT(*) DESC) Subcategory
FROM tablename
GROUP BY Category, Subcategory;
See the demo.
CodePudding user response:
here is one way, which handles ties as well :
select * from (
select category,Subcategory,rank() over (partition by category order by count(*) desc) rn
from tablename
group by category,Subcategory
)t where rn = 1
db<>fiddle here
CodePudding user response:
This gets the highest of each subcategory, and returns two subcategories if the counts tie:
select a.category, a.subcategory, itemcounts.total
from table a
cross apply ( select top 1 b.subcategory, count(b.item) as total
from table b
where b.category = a.category
group by b.subcategory
order by count(b.item) desc) itemcounts
group by a.category, a.subcategory, itemcounts.total
having count(a.item) = itemcounts.total