Home > OS >  SQL - Find end date and start date according to timestamp record
SQL - Find end date and start date according to timestamp record

Time:06-28

Currently I have the following table, and I want to aggregate it to some kind of a log of the status changes for each asset_id, that will include a start and end date.

ie, If this is my data:

Asset_Id Status_Id Update_Date
1 1 2021-06-29 01:00:00
1 1 2021-06-29 04:00:00
1 3 2021-06-29 05:00:00
1 3 2021-06-29 05:00:31
1 1 2021-06-29 05:01:00
1 1 2021-06-29 05:08:00
1 2 2021-06-30 12:12:12
1 2 2021-06-30 12:15:12
1 2 2021-07-30 04:12:12

I would like to aggregate it to:

Asset_Id Status_Id Start_date End_Date
1 1 2021-06-29 01:00:00 2021-06-29 04:59:59
1 3 2021-06-29 05:00:00 2021-06-29 05:00:59
1 1 2021-06-29 05:01:00 2021-06-30 12:12:11
1 2 2021-06-30 12:12:12 2099-12-31 00:00:0

I've found similar proposed solutions, but none of them included the actual status_id and the end_date didn't included the time until the next row start time minus a second or minute, ie Get list with start and end values from table of datetimes

CodePudding user response:

Please try the following solution.

SQL

DECLARE @tbl TABLE (Asset_Id INT, Status_Id INT, Update_Date DATETIME);
INSERT @tbl (Asset_Id, Status_Id, Update_Date) VALUES
(1, 1, '2021-06-29 01:00:00'),
(1, 1, '2021-06-29 04:00:00'),
(1, 3, '2021-06-29 05:00:00'),
(1, 3, '2021-06-29 05:00:31'),
(1, 1, '2021-06-29 05:01:00'),
(1, 1, '2021-06-29 05:08:00'),
(1, 2, '2021-06-30 12:12:12'),
(1, 2, '2021-06-30 12:15:12'),
(1, 2, '2021-07-30 04:12:12');

;WITH rs AS
(
    SELECT * 
        , series = status_id   ROW_NUMBER() OVER (ORDER BY Update_Date ASC) - 
            ROW_NUMBER() OVER (PARTITION BY Asset_ID, Status_Id ORDER BY Update_Date ASC)
    FROM @tbl
)
SELECT Asset_Id, Status_Id, MIN(Update_Date) AS Start_date, MAX(rs.Update_Date) AS End_Date
FROM rs
GROUP BY Asset_Id, Status_Id, series
ORDER BY MIN(Update_Date) ASC;

Output

 ---------- ----------- ------------------------- ------------------------- 
| Asset_Id | Status_Id |       Start_date        |        End_Date         |
 ---------- ----------- ------------------------- ------------------------- 
|        1 |         1 | 2021-06-29 01:00:00.000 | 2021-06-29 04:00:00.000 |
|        1 |         3 | 2021-06-29 05:00:00.000 | 2021-06-29 05:00:31.000 |
|        1 |         1 | 2021-06-29 05:01:00.000 | 2021-06-29 05:08:00.000 |
|        1 |         2 | 2021-06-30 12:12:12.000 | 2021-07-30 04:12:12.000 |
 ---------- ----------- ------------------------- ------------------------- 

CodePudding user response:

One option to solve this problem could be:

  • generating a flag that identifies when a new group is encountered (when the value of "Status_Id" has changed) - cte1
  • computing a running sum over the flag, so that you'll be able to have the increasing groups - cte2
  • aggregating on the Update_Date and the next date, by grouping on the newly identified groups - query
WITH cte1 AS (
    SELECT *, 
           CASE WHEN Status_Id <> LAG(Status_Id) OVER(PARTITION BY Asset_Id ORDER BY Update_Date)
                THEN 1 
                END AS Changed_Status
    FROM tab
), cte2 AS (
    SELECT *,
           LEAD(Update_Date) OVER(PARTITION BY Asset_Id ORDER BY Update_Date) AS Last_Date,
           SUM(Changed_Status) OVER(PARTITION BY Asset_Id ORDER BY Update_Date) AS New_groups
    FROM cte1
)
SELECT Asset_Id,
       Status_Id,
       MIN(Update_Date) AS Start_Date,
       MAX(Last_Date)   AS End_Date
FROM cte2
GROUP BY Asset_Id,
         Status_Id,
         New_groups
ORDER BY New_groups

Check the demo here.

Note: Minor fixes should be carried out to specifically match your output (like removing one second from end dates, using a max value for the last row), though you can get the core idea from this solution.

  • Related