Select multiple distinct rows-CodePudding

I have a table with following data.

id country serial other_column
1  us      123    1
2  us      456    1
3  gb      123    1
4  gb      456    1
5  jp      777    1
6  jp      888    1

7  us      123    2
8  us      456    3
9  gb      456    4
10  us     123    1
11  us     123    1

Is there a way to fetch 2 rows per unique country and unique serial?

For example, expecting following results from my query.

us,123,1 comes twice cos there was 3 of the same kind and I want 2 rows per unique country and unique serial.

us,123,1
us,123,1 
us,456,1
gb,123,1
gb,456,1
jp,777,1
jp,888,1

I can't use:

select distinct country, serial from my_table;

Since I want 2 rows per distinct value match for country and serial. Pls advice.

CodePudding user response：

Assign DENSE_RANK and ROW_NUMBER to your data set using a CTE or subquery then return rows with a ROW_NUMBER less than 3 and a DENSE_RANK equal to 1. Also, since you did not specify the ORDERING, I've added a custom ORDER BY to handle your Country sorting to match your desired output above.

SELECT
  ID,
  Country, 
  Serial, 
  other_column
FROM 
  (SELECT 
   *,
  ROW_NUMBER() OVER (PARTITION BY Country, Serial ORDER BY Country, Serial, other_column) AS RN,
  DENSE_RANK() OVER (PARTITION BY Country, Serial ORDER BY Country, Serial, other_column) AS DR
FROM my_table) A
  WHERE RN < 3 AND DR = 1
ORDER BY CASE WHEN Country = 'us' THEN 1
              WHEN Country = 'gb' THEN 2
              WHEN Country = 'jp' THEN 3
              ELSE 4
         END ASC, Country, Serial, other_column, ID

Result:

| ID | Country | Serial | other_column  |
|----|---------|--------|---------------|
| 1  | us      | 123    | 1             |
| 10 | us      | 123    | 1             |
| 2  | us      | 456    | 1             |
| 3  | gb      | 123    | 1             |
| 4  | gb      | 456    | 1             |
| 5  | jp      | 777    | 1             |
| 6  | jp      | 888    | 1             |

Fiddle here.

CodePudding user response：

since you didn't specify any logic which rows it should pick based on the 'other column' value (i am assuming it doesn't matter for you).

Having said that, my code will always pick two rows based on unique country and serial with the other_column value as ascending. For example if you have 3 rows:

us, 123, 1
us, 123, 1
us, 123, 2

it will go for first two since other_column value is set to ASC, if you want the other way around you can change the code to order by DESC within the partition by clause.

If there are less than 3 rows for the country and serial it would just pick 1 row.

for example

us, 456, 1
us, 456, 1
us, 123, 1
us, 123, 2
us, 123, 3

would result in:

us, 456,1
us, 123,1
us, 123,2

with main as (
select 
*,
count(*) over(partition by country, serial) as total_occurence,
row_number() over(partition by country, serial order by other_column) as rank_
from <table_name>
),

conditions as (
  select *, 
  case when total_occurence < 3 and rank_ = 1 then true 
       when total_occurence >=3 and rank_ in (1,2) then true else
  false end as is_relevant
from main
)

select * from conditions where is_relevant