Select rows without the maximum amount or rows from grouping-CodePudding

I have a table called MyTable which has columns A, B and then multiple other columns where the values don't matter.

What I want to do is filter out all the rows, that when we group the data by A gives the maximum amount of rows for the given B. Probably easier to explain with an example, if the data looked like this

A B ...
a f ...
a f ...
a f ...
a g ...
a g ...
b h ...
b h ...
b i ...
b i ...
c j ...
c j ...

The output would be

A B ...
a g ...
a g ...
b i ...
b i ...

Filtered out the all the data with (a, f) because there is 3 of them compared to only 2 of (a, g).

Filtered out (b, h) because there is 2 of them compared to 2 of (b, i), in this case it makes no difference which we filter out as long it's one of them.

Filtered out(c, j) as it is the only grouping and therefore still the maximum amount.

In term of how to implement this I'm thinking we need to do something like this at some point to get the amount for each grouping:

SELECT A, B, count()
FROM MyTable
GROUP BY A, B

This should initially give something of the form:

A B count
a f 3
a g 2
b h 2
b i 2
c j 2

Not sure at this point how to get the maximum for each A then apply it when selecting from the original table?

CodePudding user response：

If your RDBMS uses window functions use row_number in a CTE

with cte as(
select
  grouper, 
  value,
  count(value) "number",
  row_number() over (partition by grouper order by count(value) desc) rn
from t
group by 
  grouper,
  value)
select * from cte where rn = 1;

grouper | value | number | rn
:------ | :---- | -----: | -:
a       | f     |      3 |  1
b       | h     |      2 |  1
c       | j     |      2 |  1

db<>fiddle here

CodePudding user response：

Here is one option:

with tabl1 as(
select col1,col2,count(*)over(partition by col1,col2) cnt
from test1 t1)
select col1,max(col2) from tabl1 t1
WHERE exists(
       select *
       from tabl1 t2
       where t1.cnt
       <=t2.cnt and t1.col1=t2.col1 and t1.col2!=t2.col2
       )
       group by col1

Sample:

create table test1 (col1 varchar(1),col2 varchar(1));

insert into test1 values ('a', 'f'); 
insert into test1 values ('a', 'f'); 
insert into test1 values ('a', 'f'); 
insert into test1 values ('a', 'g'); 
insert into test1 values ('a', 'g');
insert into test1 values ('b', 'h'); 
insert into test1 values ('b', 'h'); 
insert into test1 values ('b', 'i'); 
insert into test1 values ('b', 'i'); 
insert into test1 values ('c', 'j'); 
insert into test1 values ('c', 'j');

Result:

COL1 | MAX(COL2)
b      i
a      g