Home > OS >  Is there a better way to summarize a time series by category start and end times in T-SQL?
Is there a better way to summarize a time series by category start and end times in T-SQL?

Time:02-11

I would like to summarize a time series of data by:

  1. Category - (1. Low, 2. Moderate, 3. Strong, 4. Extreme).
  2. The start and end times of the run of each Category.

The image below shows a snippet of the data and the desired output.

Data and Desired Output

  • Since the Categories (1. Low, 2. Moderate, 3. Strong, 4. Extreme) repeat throughout the timeline, an approach such as the SELECT below will summarize the MIN and MAX ReadingDateTime values over the entire table in four rows.

    • The minimum and maximum ReadingDateTime over all readings in the Low category.
    • The minimum and maximum ReadingDateTime over all readings in the Moderate category.
    • The minimum and maximum ReadingDateTime over all readings in the Strong category.
    • The minimum and maximum ReadingDateTime over all readings in the Extreme category.
    SELECT.. 
             MIN(ReadingDateTime)
            ,MAX(ReadingDateTime)
    FROM CurrentLogger
    GROUP BY LoggerId
            ,Category
    ORDER BY LoggerId
            ,ReadingDateTime

For example:

enter image description here

I have one approach, but I am sure there is a much better way. I mark the start and end of runs of Categories by using the SQL LAG and LEAD functions.

SELECT LoggerId
      ,Location
      ,Category
      ,dBDateTime
      ,FirstInRun = IIF(LAG(dBCategory,1,0) OVER(ORDER BY InstrumentId, dBDateTime)<>dBCategory,1,0)
      ,LastInRun = IIF(LEAD(dBCategory,1,0) OVER(ORDER BY InstrumentId, dBDateTime)<>dBCategory,1,0)
FROM CurrentLogger
ORDER BY InstrumentId 
        ,dBDateTime

The output is below:

enter image description here

There are several ugly and slow ways to roll this up to the desired output.

My question is:

  • Before rolling this up to the desired output, is there a far more obvious way to get the start and end times for each of the consecutive categorical runs?

CodePudding user response:

One way to roll it up is to use the ROW_NUMBER() function to create your groups and then get the MIN/MAX dates for each group. Consider the following:

CREATE TABLE #CurrentLogger(LoggerID int, Location varchar(20),Category varchar(20),ReadingDateTime datetime)
INSERT INTO #CurrentLogger VALUES
(10,'OutSide','1. Low','9/14/21 8:16:37 AM'),
(10,'OutSide','1. Low','9/14/21 8:16:39 AM'),
(10,'OutSide','1. Low','9/14/21 8:16:41 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:43 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:45 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:47 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:49 AM'),
(10,'OutSide','4. Extreme','9/14/21 8:16:51 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:53 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:55 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:57 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:59 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:01 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:03 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:05 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:07 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:09 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:11 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:13 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:15 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:17 AM');

WITH CTE AS
(
    SELECT *,
        ROW_NUMBER() OVER(PARTITION BY LoggerID ORDER BY ReadingDateTime)
        - ROW_NUMBER() OVER(PARTITION BY LoggerID,Location,Category ORDER BY ReadingDateTime) Grp
    FROM #CurrentLogger
)
SELECT LoggerID,Location,Category, MIN(ReadingDateTime) StartDate, MAX(ReadingDateTime) EndDate
FROM CTE
GROUP BY Grp,LoggerID,Location,Category
ORDER BY StartDate
  • Related