Home > Net >  Return unique records in bigquery
Return unique records in bigquery

Time:10-11

This is my table

id fruit color score
1 apple green 10
2 apple red 9.4
3 apple yellow 6
4 lemon green 8
5 lemon yellow 5
6 banana yellow 10
7 banana red 6

It's sorted by score, I just need to get one record by fruit, I don't care about the color, however I need to show just the first value in color.

My desired outcome

id fruit color score
1 apple green 10
4 lemon green 8
6 banana yellow 10

I can't use SELECT DISTINCT because If I'm returning the color column, it makes the entire record unique. I'm thinking of a script that once it finds a fruit that's already in the results, ignore the rest and show only the first one.

CodePudding user response:

select * from [table_name]
qualify row_number() over(partition by fruit order by score desc) = 1
  • assuming you are in need of first value, based on the highest score
  • use the row_number(), it will produce 1,2,3 for each fruit and selecting qualify = 1 would return one row
  • use qualify to directly filter the windows function results
  • further you can read if you are looking for rank(), row_number() or dense_rank() from here SQL RANK() versus ROW_NUMBER()

CodePudding user response:

You could use rank() and partition by fruit as a subquery...

select id, fruit, color, score
from (
  select id, fruit, color, score, 
  rank() over (partition by fruit order by score desc) as frank
  from my_data
  )z
where frank = 1
id fruit color score
1 apple green 10
6 banana yellow 10
4 lemon green 8

If you have ties but only want one row each, then replace rank() with row_number().

CodePudding user response:

This would do the trick, as window functions tend to be slower onb big data

SELECT t1.id,   t1.fruit,   t1.color,   t1.score
FROM my_table t1 INNER JOIN (
SELECT MIN(id) min_id, fruit FROM my_table GROUP BY fruit) t2 WHERE t1.id = t2.min_id
  • Related