sales_data
table:
create table sales_data
(
store int,
sales_year int,
sales_total float
);
sales_data
data:
store | sales_year | sales_total |
---|---|---|
1 | 2020 | 100 |
1 | 2021 | 120 |
1 | 2022 | 60 |
Problematic query on sales_data
:
select
sales_year,
sales_total,
case
when sales_year <> 2022
then row_number() over (partition by store order by sales_total)
end as rn
from
sales_data
order by
sales_year
Result (unwanted):
sales_year | sales_total | rn |
---|---|---|
2020 | 100 | 2 |
2021 | 120 | 3 |
2022 | 60 |
Why does row_number()
start from 2? How I can make it start from 1?
I want to keep all sales year, so I cannot exclude in where.
Desired result:
sales_year | sales_total | rn |
---|---|---|
2020 | 100 | 1 |
2021 | 120 | 2 |
2022 | 60 |
CodePudding user response:
The ROW_NUMBER
ranking starts from 2 instead of 1 because it works on all your data. Not displaying it doesn't mean value 1 doesn't get assigned to 2022.
In order to solve this problem, you can twist your ROW_NUMBER
ordering just a bit by adding partitioning on the boolean value sales_year = 2022
as follows:
select
sales_year,
sales_total,
case
when sales_year <> 2022
then ROW_NUMBER() OVER (PARTITION BY store, sales_year=2022 ORDER BY sales_total)
end as rn
from sales_data
order by sales_year
Check the demo here.
CodePudding user response:
For the hell of it, here's another (though I would say worse) way to do it.
Instead of creating two partitions per store, force some rows to be ordered last within the partition.
That way, when your CASE expression 'ignores' some of the results, they're always the last rows and so don't leave 'holes' in the sequences.
select
sales_year,
sales_total,
case
when sales_year <> 2022
then row_number() over (partition by store order by sales_year = 2022, sales_total)
end as rn
from
sales_data
order by
sales_year
Note: False comes before True, like 0 comes before 1. So, the ORDER BY puts the rows that are 2022 after all the other rows.
Partitioning is cheaper than sorting, so don't do this. It's just an example to help clarify that "excluding rows" from rowNumber isn't an option, but that if you control where a row is included, you can subsequently ignore the result safely.