I am working on one of requirement the raw data is in following format
Requirement - Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.
Customer | Status | Date |
---|---|---|
A123 | 0 | 7/2/2021 |
A123 | 0 | 7/15/2021 |
A123 | 0 | 7/22/2021 |
A123 | 1 | 8/18/2021 |
A123 | 1 | 9/8/2021 |
A123 | 0 | 12/1/2021 |
A123 | 0 | 1/21/2022 |
A123 | 1 | 3/6/2022 |
A123 | 1 | 3/7/2022 |
A123 | 0 | 3/15/2022 |
B123 | 1 | 1/1/2022 |
B123 | 0 | 1/6/2022 |
C123 | 1 | 1/2/2022 |
C123 | 2 | 1/8/2022 |
C123 | 0 | 1/9/2022 |
expected output
Customer | StartDate | EndDate |
---|---|---|
A123 | 8/18/2021 | 12/1/2021 |
A123 | 9/8/2021 | 12/1/2021 |
A123 | 3/6/2022 | 3/15/2022 |
A123 | 3/7/2022 | 3/15/2022 |
B123 | 1/1/2022 | 1/6/2022 |
C123 | 1/2/2022 | 1/8/2022 |
Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.
Query Explanation - In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.
SELECT A.[Customer],A.StartDate,B.EndDate
from
(
SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] = 1
GROUP BY Customer,Date,[Status]
) A
LEFT JOIN
(
SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] != 1
AND Date>(
SELECT MIN(Date) AS STARTDATE
FROM table1
WHERE [STATUS] = 1
)
GROUP BY Customer,Date,[Status]
) B
ON
(
A.[Customer] = B.[Customer]
AND A.RowNum = B.RowNum
)
ORDER BY A.Startdate
CodePudding user response:
Here is a more efficient way to do this without a self-join.
WITH cte01only AS
( SELECT *, CASE Status WHEN 1 THEN 1 ELSE 0 END AS Status1 FROM table1 ),
cteDifference AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Date, Status1)
- ROW_NUMBER() OVER (PARTITION BY Customer, Status1 ORDER BY Date) AS StatusGroup
FROM cte01only
),
cteGroup AS
(
SELECT Customer, StatusGroup, Status1, MIN(Date) As StartDate
FROM cteDifference
GROUP BY Customer, StatusGroup, Status1
),
cteNextDate AS
(
SELECT Customer, StatusGroup, Status1, StartDate,
LEAD(StartDate, 1, NULL) OVER (PARTITION BY Customer ORDER BY StatusGroup) AS EndDate
FROM cteGroup
)
SELECT Customer, StartDate, EndDate
FROM cteNextDate
WHERE Status1 = 1
ORDER BY Customer, StateDate
The key trick here is the second CTE which uses the difference of two ROW_NUMBER()
functions to tag (Status1
) the customer records into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use the LEAD()
function to get the following group's StartDate
as the current groupings EndDate
.
(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)
CodePudding user response:
First you list the rows where Status = 1
and then use CROSS APPLY
to get the corresponding minimum Date where the Status is not equal to 1
select s.[Customer],
StartDate = s.[Date],
EndDate = e.[Date]
from Table1 s
cross apply
(
select [Date] = min(e.[Date])
from Table1 e
where e.[Customer] = s.[Customer]
and e.[Date] > s.[Date]
and e.[Status] <> 1
) e
where s.[Status] = 1
order by s.[Customer], s.[Date]
CodePudding user response:
You can try to use CROSS APPLY
with TOP 1
to get the value greater than subquery.
SELECT t1.*,t2.Date
FROM (
SELECT Customer,Date
FROM table1
WHERE Status = 1
) t1 CROSS APPLY (
SELECT TOP 1 Date
FROM table1 tt
WHERE tt.Status <> 1
AND tt.Customer = t1.Customer
AND tt.Date > t1.Date
ORDER BY tt.Date
) t2