Get rows with maximum count per one column - while grouping by two columns-CodePudding

I'm trying to get max count of a field. This is what I get and what I'm tried to do.

| col1 | col2 |
|  A   |  B   |
|  A   |  B   |
|  A   |  D   |
|  A   |  D   |
|  A   |  D   |
|  C   |  F   |
|  C   |  G   |
|  C   |  F   |

I'm trying to get the max count occurrences of col2, grouped by col1.

With this query I get the occurrences grouped by col1 and col2.

SELECT col1, col2, count(*) as conta 
FROM tab 
WHERE 
GROUP by col1, col2 
ORDER BY col1, col2

And I get:

| col1 | col2 | conta |
|  A   |  B   |   2   |
|  A   |  D   |   3   |
|  C   |  F   |   2   |
|  C   |  G   |   1   |

Then I used this query to get max of count:

SELECT max(conta) as conta2, col1 
FROM (
    SELECT col1, col2, count(*) as conta 
    FROM tab 
    WHERE 
    GROUP BY col1, col2 
    ORDER BY col1, col2
) AS derivedTable 
GROUP BY col1

And I get:

| col1 | conta |
|  A   |   3   |
|  C   |   2   |

What I'm missing is the value of col2. I would like something like this:

| col1 | col2 | conta |
|  A   |  D   |   3   |
|  C   |  F   |   2   |

The problem is that if I try to select the col2 field, I get an error message, that I have to use this field in group by or aggregation function, but using it in the group by it's not the right way.

CodePudding user response：

Simpler & faster (and correct):

SELECT DISTINCT ON (col1)
       col1, col2, count(*) AS conta
FROM   tab 
GROUP  BY col1, col2 
ORDER  BY col1, conta DESC;

db<>fiddle here (based on a_horse's fiddle)

DISTINCT ON is applied after aggregation, so we don't need a subquery or CTE. Consider the sequence of events in a SELECT query:

CodePudding user response：

You can combine GROUP BY with a window function - which gets evaluated after the group by:

with cte as (
  SELECT col1, col2, 
         count(*) as conta,
         dense_rank() over (partition by col1 order by count(*) desc) as rnk
  FROM tab 
  WHERE ...
  GROUP by col1, col2 
) 
select col1, col2, conta
from cte
where rnk = 1
order by col1, col2;

This will return the combination of col1,col2 with the same highest max count twice. If you don't want that, use row_number() instead of dense_rank()

Online example

CodePudding user response：

Possibly not the most elegant solution, but using a common table expression may help.

with cte as (
select col1, col2, count(*) as total
from dtable 
group by col1, col2
)
select  col1, col2, total 
from cte c
where total = (select max(total) 
           from cte cc
           where cc.col1 = c.col1)
order by col1 asc

Returns

col1|col2|total|
---- ---- ----- 
 A  | D  |    3|
 C  | F  |    2|

from the docs

CodePudding user response：

I misunderstood the question. Here is your solution:

;with tablex as
    (Select col1, col2, Count(col2) as Count From Your_Table Group by col1, col2),
aaaa as
    (Select ROW_NUMBER() over (partition by col1 order by Count desc) as row, * From tablex)

Select * From aaaa Where row = 1

CodePudding user response：

Using a window function:

select distinct on (col1) col1, col2, cnt
from 
(
 select col1, col2, count(*) over (partition by col1, col2) cnt 
 from the_table
) t
order by col1, cnt desc;

col1	col2	cnt
A	D	3
C	F	2

This solution does not solve cases with ties.