I am strugling with some T-SQL code to get column [Num] (in below example table). Basically, I am trying to create an index on when a new sequence of [Type] is starting.
Q: How to create the [num] field below?
[Num] is the order of sequential ranges of field [Type]. You see that from 2022-01-01 to 2022-01-03 [Type] = 'Working'. Then on 2022-01-07, a new range of sequential rows with the same type starts (only one day though) and therefore it is 2. Basically, if there is no sequential days of the same type and another day pops up with a different type, the range stops. When a new day starts with that same type (> 1 days later), a new range starts. You can assume there cannot be different types on one day in my case.
What makes it tricky for me is that I cannot partition it on Type as it is recurring in a new sequence later on using a regular row_number.
Some extra explanation why I need this: I am creating a SCD dimension with a valid from and valid to date by taking the min and max grouped by [Type]. But since [Type] is recurring this does not work. I need a way to make each nonsequential series of [Type] unique. I am of course open to other suggestions if my solution is sub optimal.
Table_1
Date | Type | Num |
---|---|---|
2022-01-01 | Working | 1 |
2022-01-02 | Working | 1 |
2022-01-03 | Working | 1 |
2022-01-04 | Resting | 1 |
2022-01-05 | Resting | 1 |
2022-01-06 | Resting | 1 |
2022-01-07 | Working | 2 |
2022-01-08 | Resting | 2 |
2022-01-09 | Resting | 2 |
2022-01-10 | Resting | 2 |
2022-01-11 | Resting | 2 |
2022-01-12 | Resting | 2 |
2022-01-12 | Traveling | 1 |
2022-01-13 | Working | 3 |
2022-01-14 | Traveling | 2 |
2022-01-15 | Working | 4 |
2022-01-16 | Working | 4 |
Thank you so much!
Update:
I omit the information that is in my belief not necessary.
The end goal of this question is to create a table as below (table_2). If I have field [num] in table_1 I will be able to create table_2 myself but I am open to suggestions as well of course.
Table_2
Date | Type | Start Date | End Date |
---|---|---|---|
2022-01-01 | Working | 2022-01-01 | 2022-01-03 |
2022-01-04 | Resting | 2022-01-04 | 2022-01-06 |
2022-01-07 | Working | 2022-01-07 | 2022-01-07 |
2022-01-08 | Resting | 2022-01-08 | 2022-01-12 |
2022-01-12 | Traveling | 2022-01-12 | 2022-01-12 |
2022-01-13 | Working | 2022-01-13 | 2022-01-13 |
2022-01-14 | Traveling | 2022-01-14 | 2022-01-14 |
2022-01-15 | Working | 2022-01-15 | 2022-01-16 |
CodePudding user response:
This is a classic gaps and island problem. Find your islands, and then DENSE_RANK
them:
WITH Grps AS(
SELECT V.Date,
V.Type,
V.Num,
ROW_NUMBER() OVER (ORDER BY V.Date) -
ROW_NUMBER() OVER (PARTITION BY V.[Type] ORDER BY V.[Date]) AS Grp
FROM (VALUES(CONVERT(date,'2022-01-01'),'Working',1),
(CONVERT(date,'2022-01-02'),'Working',1),
(CONVERT(date,'2022-01-03'),'Working',1),
(CONVERT(date,'2022-01-04'),'Resting',1),
(CONVERT(date,'2022-01-05'),'Resting',1),
(CONVERT(date,'2022-01-06'),'Resting',1),
(CONVERT(date,'2022-01-07'),'Working',2),
(CONVERT(date,'2022-01-08'),'Resting',2),
(CONVERT(date,'2022-01-09'),'Resting',2),
(CONVERT(date,'2022-01-10'),'Resting',2),
(CONVERT(date,'2022-01-11'),'Resting',2),
(CONVERT(date,'2022-01-12'),'Resting',2),
(CONVERT(date,'2022-01-12'),'Traveling',1),
(CONVERT(date,'2022-01-13'),'Working',3),
(CONVERT(date,'2022-01-14'),'Traveling',2),
(CONVERT(date,'2022-01-15'),'Working',4),
(CONVERT(date,'2022-01-16'),'Working',4))V([Date], [Type], Num))
SELECT G.Date,
G.Type,
G.Num AS YourNum,
DENSE_RANK() OVER (PARTITION BY G.Type ORDER BY Grp) AS CalcedNum
FROM Grps G
ORDER BY G.Date;