I am trying to to create a rank for each instance of a status occurring, for example
ID | Status | From_date | To_date | rank |
---|---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-02 | 1 |
1 | Available | 2022-01-02 | 2022-01-03 | 1 |
1 | Unavailable | 2022-01-03 | 2022-01-10 | 2 |
1 | Available | 2022-01-10 | 2022-01-20 | 3 |
For each ID
, for each instance of a status
occurring, by from_date
ascending.
I want to do this as i see this as the best way of getting to the final result i want which is
ID | Status | From_date | To_date | rank |
---|---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-03 | 1 |
1 | Unavailable | 2022-01-03 | 2022-01-10 | 2 |
1 | Available | 2022-01-10 | 2022-01-20 | 3 |
I tried dense_rank(partition by id order by status, from_date
but can see now why that wouldnt work. Not sure how to get to this result.
CodePudding user response:
This is a typical gaps-and-island problem, where islands are groups of consecutive rows that have the same status.
Here is one way to solve it with window functions:
select id, status,
min(from_date) from_date, max(to_date) to_date,
row_number() over (partition by id order by min(from_date)) rn
from (
select t.*,
row_number() over (partition by id order by from_date) rn1,
row_number() over (partition by id, status order by from_date) rn2
from mytable t
) t
group by id, status, rn1 - rn2
order by min(from_date)
This worked by ranking rows within two different partitions (with a without the status) ; the difference between the row numbers define the islands.
CodePudding user response:
So with this CTE for the data:
with data(ID, Status, From_date, To_date) as (
select * from values
(1, 'Available', '2022-01-01', '2022-01-02'),
(1, 'Available', '2022-01-02', '2022-01-03'),
(1, 'Unavailable', '2022-01-03', '2022-01-10'),
(1, 'Available', '2022-01-10', '2022-01-20')
)
the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data;
ID | STATUS | FROM_DATE | TO_DATE | RANK |
---|---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-02 | 0 |
1 | Available | 2022-01-02 | 2022-01-03 | 0 |
1 | Unavailable | 2022-01-03 | 2022-01-10 | 1 |
1 | Available | 2022-01-10 | 2022-01-20 | 2 |
and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:
select * from (
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data
)
qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
gives:
ID | STATUS | FROM_DATE | TO_DATE | RANK |
---|---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-02 | 0 |
1 | Unavailable | 2022-01-03 | 2022-01-10 | 1 |
1 | Available | 2022-01-10 | 2022-01-20 | 2 |
Also, the final result minus the ranking can be done with:
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
ID | STATUS | FROM_DATE | TO_DATE |
---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-02 |
1 | Unavailable | 2022-01-03 | 2022-01-10 |
1 | Available | 2022-01-10 | 2022-01-20 |
and thus a rank can be added at the end
select *
,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
from (
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
)
ID | STATUS | FROM_DATE | TO_DATE | RANK |
---|---|---|---|---|
1 | Available | 2022-01-01 | 2022-01-02 | 1 |
1 | Unavailable | 2022-01-03 | 2022-01-10 | 2 |
1 | Available | 2022-01-10 | 2022-01-20 | 3 |
CodePudding user response:
You can group consecutive status using conditional_change_event
, then collapse the dates using min
and max
, and finally use row_number()
to rank the events
with cte as
(select *,conditional_change_event(status) over (partition by id order by from_date) as rn
from t)
select id,
status,
min(from_date) as from_date,
max(to_date) as to_date,
row_number() over (partition by id, order by min(from_date), max(to_date)) as rank
from cte
group by id, status, rn
order by rank