Home > Net >  Pick a record based on the value of one column being the greatest in Snowflake
Pick a record based on the value of one column being the greatest in Snowflake

Time:08-24

Let's say I have a table structured like this

Name Score
Mike 40
Mike 79
Mike 49

And I wanted to return just the row that says Mike with the score of 79 and nothing else.

The code I have been playing around with looks like this:

SELECT Name, COUNT(Name), greatest(Score) FROM
table GROUP BY Name, Score

I tried a few different variations like using Rank and the greatest function, but haven't had too much luck. Any help would be much appreciated, thanks.

CodePudding user response:

Using QUALIFY and RANK/ROW_NUMBER:

SELECT *
FROM tab
QUALIFY RANK() OVER(PARTITION BY Name ORDER BY Score DESC) = 1

CodePudding user response:

The long form explanation:

If you add a ROW_NUMBER, and a RANK to the altered data:

WITH data(name, score, extra) as (
    select * from values
    ('Mike', 40, 'a'),
    ('Mike', 79, 'b'),
    ('Mike', 79, 'c')
)
select *
    ,row_number() over (partition by name order by score desc) as rn
    ,rank() over (partition by name order by score desc) as rank
from data;
NAME SCORE EXTRA RN RANK
Mike 79 b 1 1
Mike 79 c 2 1
Mike 40 a 3 3

You can see that ROW_NUMBER will only assign the value 1 to one value, where RANK will give you as many values that hold that spot, and in the case of sparse RANK there will be gaps, as Mike,40 is the third value. So the choice between RANK/ROW_NUMBER depends how you want to handle the results and if you are joining to this data etc.

Then you can do a filter in the classic ANSI form:

WITH data(name, score, extra) as (
    select * from values
    ('Mike', 40, 'a'),
    ('Mike', 79, 'b'),
    ('Mike', 79, 'c')
)
select name, score, extra
from (
    select *
        ,row_number() over (partition by name order by score desc) as rn
    from data
)
where rn = 1;

Note this is an unstable sort, as Mike,79,b OR Mike,79,c can be returned by the database, but with ROW_NUMBER you will only get one.

Snowflake has the QUALIFY command which allows dropping the sub-select, and having another filter run after grouping is complete.

So you can write:

select *
    ,row_number() over (partition by name order by score desc) as rn
from data
QUALIFY rn = 1;
NAME SCORE EXTRA RN
Mike 79 b 1

but if you do not want to see the ROW_NUMBER value, it can be moved to the QUALIFY and the scope time is the exact same as the query, but it makes things tidier:

WITH data(name, score, extra) as (
    select * from values
    ('Mike', 40, 'a'),
    ('Mike', 79, 'b'),
    ('Mike', 79, 'c')
)
select *
from data
QUALIFY row_number() over (partition by name order by score desc) = 1;
NAME SCORE EXTRA
Mike 79 b
  • Related