Home > Enterprise >  Exclude row in ROW_NUMBER() with case
Exclude row in ROW_NUMBER() with case

Time:10-14

sales_data table:

create table sales_data
(
    store int,
    sales_year int,
    sales_total float
);

sales_data data:

store sales_year sales_total
1 2020 100
1 2021 120
1 2022 60

Problematic query on sales_data:

select 
    sales_year,
    sales_total,
    case 
        when sales_year <> 2022
            then row_number() over (partition by store order by sales_total) 
    end as rn
from 
    sales_data
order by 
    sales_year

Result (unwanted):

sales_year sales_total rn
2020 100 2
2021 120 3
2022 60

Why does row_number() start from 2? How I can make it start from 1? I want to keep all sales year, so I cannot exclude in where.

Desired result:

sales_year sales_total rn
2020 100 1
2021 120 2
2022 60

CodePudding user response:

The ROW_NUMBER ranking starts from 2 instead of 1 because it works on all your data. Not displaying it doesn't mean value 1 doesn't get assigned to 2022.

In order to solve this problem, you can twist your ROW_NUMBER ordering just a bit by adding partitioning on the boolean value sales_year = 2022 as follows:

select 
    sales_year,
    sales_total,
    case 
        when sales_year <> 2022
        then ROW_NUMBER() OVER (PARTITION BY store, sales_year=2022 ORDER BY sales_total) 
    end as rn
from sales_data
order by sales_year

Check the demo here.

CodePudding user response:

For the hell of it, here's another (though I would say worse) way to do it.

Instead of creating two partitions per store, force some rows to be ordered last within the partition.

That way, when your CASE expression 'ignores' some of the results, they're always the last rows and so don't leave 'holes' in the sequences.

select 
    sales_year,
    sales_total,
    case 
        when sales_year <> 2022
            then row_number() over (partition by store order by sales_year = 2022, sales_total) 
    end as rn
from 
    sales_data
order by 
    sales_year

Note: False comes before True, like 0 comes before 1. So, the ORDER BY puts the rows that are 2022 after all the other rows.

Partitioning is cheaper than sorting, so don't do this. It's just an example to help clarify that "excluding rows" from rowNumber isn't an option, but that if you control where a row is included, you can subsequently ignore the result safely.

https://www.db-fiddle.com/f/24ZxTg91AWpkJwZqZa9Kxd/0

  • Related