Home > Software engineering >  BigQuery SQL Rank() Over() with an exception
BigQuery SQL Rank() Over() with an exception

Time:12-09

I'm trying to create a rank() over() but need it to return null when meeting a certain criteria. Then continue the rank on the next row.

Example of what i'm trying to accomplish is the column rank_over_except in this image.

enter image description here

It is ranking() over() the identifier and is ordered by original_nr column. In this case it doesn't "rank" when the fruit is a pear.

I have tried using a CASE WHEN THEN statement. But that is simply continuing the count. The null would simply replace a 2 on the 2nd row of rank_over_except in this example. And the third row would be a 3. So that's not working as expected.

I don't see any option to write the rank() over(). Maybe there's another way of doing this and not use rank()? I've gone through the BigQuery docs, but no luck in finding a solution.

CodePudding user response:

Starting from the original numbering, we can use arithmethics and a conditional count to compute the new rank:

select t.*,
    original_nr - countif(fruit = 'apple') over(partition by identifier order by original) rank_over_except
from mytable t

A more canonical approach defines a partition which contains everything but apples with a CASE expression:

select t.*,
    case when fruit != 'apple' 
        then rank() over(
            partition by identifier, case when fruit = 'apple' then 1 else 0 end 
            order by original_nr
        ) 
    end rank_over_except
from mytable t
order by id

Both solutions are more efficient than the union approach, that requires scanning the table twice.

CodePudding user response:

The solution that worked fine for my case is based on the comment of @SR3142. Maybe with row_number @GMB's solution can also work, but the size of the data was neglegible in my case.

select 
  t.*, 
  ROW_NUMBER() OVER (PARTITION BY identifier ORDER BY original_nr ASC) rank_over_except 
from table as t 
  WHERE t.fruit NOT IN ("pear", "apple")
UNION ALL
SELECT
  NULL AS rank_over_except, 
  t.* 
FROM table AS t 
WHERE 
  t.fruit IN ("pear", "apple")

Reason to use row_number() over rank() is that row_number does not take in to account the data from the input column. So it's an actual "dumb" next in line number.

  • Related