How can I create row numbers defined only by the previous row's value?-CodePudding

This is a task previously accomplished by a cursor in a really old T-SQL script that I now have to get rid of. For a person in a table ordered by dates, I have a value indicating a sequence is starting, then continuing, and then when a new one starts (indicating the old one has ended). I cannot figure out how to get each of these sequences to have row numbers. I had something similar in an R codebase a few years ago I used RLE for but this has me stumped. I need to get from this:

ID  STATUS              DATE        A       B
1   START               2000-01-01  1       1
1   CONTINUATION_A&B    2000-01-02  NULL    NULL
1   CONTINUATION_A&B    2000-01-03  NULL    NULL
1   START               2000-01-04  1       1
1   START               2000-01-05  1       1
1   CONTINUATION_A      2000-01-06  NULL    NULL
1   CONTINUATION_A      2000-01-07  NULL    NULL

To this:

ID  STATUS              DATE        A       B
1   START               2000-01-01  1       1
1   CONTINUATION_A&B    2000-01-02  2       2
1   CONTINUATION_A&B    2000-01-03  3       3
1   START               2000-01-04  1       1
1   START               2000-01-05  1       1
1   CONTINUATION_A      2000-01-06  2       1
1   CONTINUATION_A      2000-01-07  2       1

Thanks in advance.

CodePudding user response：

Not an answer, but important for eventually answering the question and too long for a comment (hence community wiki).

I see this:

in a table orderd by dates

... but I don't see any dates in the question. Where is the date field in your sample data? We need to at least know it's name to give you good code.

One thing to get drilled into your head is tables never have any inherent or natural order. While the primary key/clustered index or insert order may seem like a natural table order, there are plenty of things that can mess with this, and unless you are explicit in your code about the order of your records the database is free to give you results in any order it finds convenient. That is, if there's not a fully-deterministic ORDER BY clause, the ordering of the results for the same query can and does change from moment to moment, depending on things like what other queries are currently running to access the same data or what pages or indexes are already in memory.

This means we need to be able to reference a field in the table to enforce the desired ordering... we need to know about that date field to write the correct SQL statement.

CodePudding user response：

with A as (
    select *,
        count(case when status  = 'START' then 1 end) over (order by "date") as grp
    from T
)
select *,
    sum(case when status in ('START', 'CONTINUATION_A', 'CONTINUATION_A&B') then 1 end)
        over (partition by grp order by "date") as A,
    sum(case when status in ('START', 'CONTINUATION_A&B') then 1 end)
        over (partition by grp order by "date") as B
from A;

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=225a1e37236c18fbb7bdbb76d7ad93dc

This assumes that counters always begin at one. That could be adjusted if necessary with expressions like these:

min(A) over (partition by grp) - 1 /* offset for A */
min(B) over (partition by grp) - 1 /* offset for B */