This is a task previously accomplished by a cursor in a really old T-SQL script that I now have to get rid of. For a person in a table ordered by dates, I have a value indicating a sequence is starting, then continuing, and then when a new one starts (indicating the old one has ended). I cannot figure out how to get each of these sequences to have row numbers. I had something similar in an R codebase a few years ago I used RLE for but this has me stumped. I need to get from this:
ID STATUS DATE A B
1 START 2000-01-01 1 1
1 CONTINUATION_A&B 2000-01-02 NULL NULL
1 CONTINUATION_A&B 2000-01-03 NULL NULL
1 START 2000-01-04 1 1
1 START 2000-01-05 1 1
1 CONTINUATION_A 2000-01-06 NULL NULL
1 CONTINUATION_A 2000-01-07 NULL NULL
To this:
ID STATUS DATE A B
1 START 2000-01-01 1 1
1 CONTINUATION_A&B 2000-01-02 2 2
1 CONTINUATION_A&B 2000-01-03 3 3
1 START 2000-01-04 1 1
1 START 2000-01-05 1 1
1 CONTINUATION_A 2000-01-06 2 1
1 CONTINUATION_A 2000-01-07 2 1
Thanks in advance.
CodePudding user response:
Not an answer, but important for eventually answering the question and too long for a comment (hence community wiki).
I see this:
in a table orderd by dates
... but I don't see any dates in the question. Where is the date field in your sample data? We need to at least know it's name to give you good code.
One thing to get drilled into your head is tables never have any inherent or natural order. While the primary key/clustered index or insert order may seem like a natural table order, there are plenty of things that can mess with this, and unless you are explicit in your code about the order of your records the database is free to give you results in any order it finds convenient. That is, if there's not a fully-deterministic ORDER BY
clause, the ordering of the results for the same query can and does change from moment to moment, depending on things like what other queries are currently running to access the same data or what pages or indexes are already in memory.
This means we need to be able to reference a field in the table to enforce the desired ordering... we need to know about that date field to write the correct SQL statement.
CodePudding user response:
with A as (
select *,
count(case when status = 'START' then 1 end) over (order by "date") as grp
from T
)
select *,
sum(case when status in ('START', 'CONTINUATION_A', 'CONTINUATION_A&B') then 1 end)
over (partition by grp order by "date") as A,
sum(case when status in ('START', 'CONTINUATION_A&B') then 1 end)
over (partition by grp order by "date") as B
from A;
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=225a1e37236c18fbb7bdbb76d7ad93dc
This assumes that counters always begin at one. That could be adjusted if necessary with expressions like these:
min(A) over (partition by grp) - 1 /* offset for A */
min(B) over (partition by grp) - 1 /* offset for B */