Currently I have the following table, and I want to aggregate it to some kind of a log of the status changes for each asset_id, that will include a start and end date.
ie, If this is my data:
Asset_Id | Status_Id | Update_Date |
---|---|---|
1 | 1 | 2021-06-29 01:00:00 |
1 | 1 | 2021-06-29 04:00:00 |
1 | 3 | 2021-06-29 05:00:00 |
1 | 3 | 2021-06-29 05:00:31 |
1 | 1 | 2021-06-29 05:01:00 |
1 | 1 | 2021-06-29 05:08:00 |
1 | 2 | 2021-06-30 12:12:12 |
1 | 2 | 2021-06-30 12:15:12 |
1 | 2 | 2021-07-30 04:12:12 |
I would like to aggregate it to:
Asset_Id | Status_Id | Start_date | End_Date |
---|---|---|---|
1 | 1 | 2021-06-29 01:00:00 | 2021-06-29 04:59:59 |
1 | 3 | 2021-06-29 05:00:00 | 2021-06-29 05:00:59 |
1 | 1 | 2021-06-29 05:01:00 | 2021-06-30 12:12:11 |
1 | 2 | 2021-06-30 12:12:12 | 2099-12-31 00:00:0 |
I've found similar proposed solutions, but none of them included the actual status_id and the end_date didn't included the time until the next row start time minus a second or minute, ie Get list with start and end values from table of datetimes
CodePudding user response:
Please try the following solution.
SQL
DECLARE @tbl TABLE (Asset_Id INT, Status_Id INT, Update_Date DATETIME);
INSERT @tbl (Asset_Id, Status_Id, Update_Date) VALUES
(1, 1, '2021-06-29 01:00:00'),
(1, 1, '2021-06-29 04:00:00'),
(1, 3, '2021-06-29 05:00:00'),
(1, 3, '2021-06-29 05:00:31'),
(1, 1, '2021-06-29 05:01:00'),
(1, 1, '2021-06-29 05:08:00'),
(1, 2, '2021-06-30 12:12:12'),
(1, 2, '2021-06-30 12:15:12'),
(1, 2, '2021-07-30 04:12:12');
;WITH rs AS
(
SELECT *
, series = status_id ROW_NUMBER() OVER (ORDER BY Update_Date ASC) -
ROW_NUMBER() OVER (PARTITION BY Asset_ID, Status_Id ORDER BY Update_Date ASC)
FROM @tbl
)
SELECT Asset_Id, Status_Id, MIN(Update_Date) AS Start_date, MAX(rs.Update_Date) AS End_Date
FROM rs
GROUP BY Asset_Id, Status_Id, series
ORDER BY MIN(Update_Date) ASC;
Output
---------- ----------- ------------------------- -------------------------
| Asset_Id | Status_Id | Start_date | End_Date |
---------- ----------- ------------------------- -------------------------
| 1 | 1 | 2021-06-29 01:00:00.000 | 2021-06-29 04:00:00.000 |
| 1 | 3 | 2021-06-29 05:00:00.000 | 2021-06-29 05:00:31.000 |
| 1 | 1 | 2021-06-29 05:01:00.000 | 2021-06-29 05:08:00.000 |
| 1 | 2 | 2021-06-30 12:12:12.000 | 2021-07-30 04:12:12.000 |
---------- ----------- ------------------------- -------------------------
CodePudding user response:
One option to solve this problem could be:
- generating a flag that identifies when a new group is encountered (when the value of "Status_Id" has changed) -
cte1
- computing a running sum over the flag, so that you'll be able to have the increasing groups -
cte2
- aggregating on the Update_Date and the next date, by grouping on the newly identified groups -
query
WITH cte1 AS (
SELECT *,
CASE WHEN Status_Id <> LAG(Status_Id) OVER(PARTITION BY Asset_Id ORDER BY Update_Date)
THEN 1
END AS Changed_Status
FROM tab
), cte2 AS (
SELECT *,
LEAD(Update_Date) OVER(PARTITION BY Asset_Id ORDER BY Update_Date) AS Last_Date,
SUM(Changed_Status) OVER(PARTITION BY Asset_Id ORDER BY Update_Date) AS New_groups
FROM cte1
)
SELECT Asset_Id,
Status_Id,
MIN(Update_Date) AS Start_Date,
MAX(Last_Date) AS End_Date
FROM cte2
GROUP BY Asset_Id,
Status_Id,
New_groups
ORDER BY New_groups
Check the demo here.
Note: Minor fixes should be carried out to specifically match your output (like removing one second from end dates, using a max value for the last row), though you can get the core idea from this solution.