Return unique records in bigquery-CodePudding

This is my table

id	fruit	color	score
1	apple	green	10
2	apple	red	9.4
3	apple	yellow	6
4	lemon	green	8
5	lemon	yellow	5
6	banana	yellow	10
7	banana	red	6

It's sorted by score, I just need to get one record by fruit, I don't care about the color, however I need to show just the first value in color.

My desired outcome

id	fruit	color	score
1	apple	green	10
4	lemon	green	8
6	banana	yellow	10

I can't use SELECT DISTINCT because If I'm returning the color column, it makes the entire record unique. I'm thinking of a script that once it finds a fruit that's already in the results, ignore the rest and show only the first one.

CodePudding user response：

select * from [table_name]
qualify row_number() over(partition by fruit order by score desc) = 1

assuming you are in need of first value, based on the highest score
use the row_number(), it will produce 1,2,3 for each fruit and selecting qualify = 1 would return one row
use qualify to directly filter the windows function results
further you can read if you are looking for rank(), row_number() or dense_rank() from here SQL RANK() versus ROW_NUMBER()

CodePudding user response：

You could use rank() and partition by fruit as a subquery...

select id, fruit, color, score
from (
  select id, fruit, color, score, 
  rank() over (partition by fruit order by score desc) as frank
  from my_data
  )z
where frank = 1

id	fruit	color	score
1	apple	green	10
6	banana	yellow	10
4	lemon	green	8

If you have ties but only want one row each, then replace rank() with row_number().

CodePudding user response：

This would do the trick, as window functions tend to be slower onb big data

SELECT t1.id,   t1.fruit,   t1.color,   t1.score
FROM my_table t1 INNER JOIN (
SELECT MIN(id) min_id, fruit FROM my_table GROUP BY fruit) t2 WHERE t1.id = t2.min_id