Home > Enterprise >  Partition data, detect value change in column, and ADD new COLUMN with DATE of change
Partition data, detect value change in column, and ADD new COLUMN with DATE of change

Time:12-03

Given the data below, I would like to:
  • [x] Partition data by device
  • [x] Report back details of the latest reading
  • [ ] Add a new field capturing the date of the latest device state change
id device date_observation device_state reading
2021100 A 2021-10-14 00:00:00.000 1 -0.3623
2021101 B 2021-10-14 00:00:00.000 1 -0.0004
2021102 C 2021-10-14 00:00:00.000 1 0.1333
2021103 A 2021-10-21 00:00:00.000 1 0.1907
2021104 B 2021-10-21 00:00:00.000 2 -0.3548
2021105 C 2021-10-21 00:00:00.000 1 -0.2275
2021106 A 2021-10-28 00:00:00.000 2 -0.2404
2021107 B 2021-10-28 00:00:00.000 2 -0.0244
2021108 C 2021-10-28 00:00:00.000 1 -0.3441
2021109 A 2021-11-04 00:00:00.000 2 0.2843
2021110 B 2021-11-04 00:00:00.000 3 0.2376
2021111 C 2021-11-04 00:00:00.000 1 -0.0247
2021112 A 2021-11-11 00:00:00.000 2 0.1814
2021113 B 2021-11-11 00:00:00.000 3 -0.2816
2021114 C 2021-11-11 00:00:00.000 1 -0.1493
2021115 A 2021-11-18 00:00:00.000 2 0.1694
2021116 B 2021-11-18 00:00:00.000 4 0.0864
2021117 C 2021-11-18 00:00:00.000 1 0.0395
2021118 A 2021-11-25 00:00:00.000 2 0.0658
2021119 B 2021-11-25 00:00:00.000 4 0.1291
2021120 C 2021-11-25 00:00:00.000 2 -0.1907

Using the code below, I am able partition data by device and to grab the latest record for each device.

WITH o AS (
SELECT *,
    ROW_NUMBER() OVER (PARTITION by device
                       ORDER BY date_observation DESC) AS queue
FROM observations  
)

SELECT *
FROM o
WHERE queue = 1

Results

id device date_observation device_state reading queue
2021118 A 2021-11-25 00:00:00.000 2 0.0658 1
2021119 B 2021-11-25 00:00:00.000 4 0.1291 1
2021120 C 2021-11-25 00:00:00.000 2 -0.1907 1

What is the best way to get the date of the latest state change?

For example the latest device state changed for device A occurred on 2021-10-28.

The expected output.

id device date_observation device_state reading queue date_state_change
2021118 A 2021-11-25 00:00:00.000 2 0.0658 1 2021-10-28
2021119 B 2021-11-25 00:00:00.000 4 0.1291 1 2021-11-18
2021120 C 2021-11-25 00:00:00.000 2 -0.1907 1 2021-11-25

Thoughts? Thanks.

CodePudding user response:

You can use more window functions

  • Use LEAD to identify the change-over rows
  • Use MAX to get the latest change-over date
WITH PrevValues AS (
    SELECT *,
      queue = ROW_NUMBER() OVER (PARTITION BY device ORDER BY date_observation DESC),
      date_state_change = CASE WHEN device_state <>
            LEAD(device_state, 1, -1) OVER (PARTITION BY device ORDER BY date_observation DESC)
            THEN date_observation END
    FROM observations o
),
MaxValues AS (
    SELECT *,
      max_date_state_change = MAX(date_state_change) OVER (PARTITION BY device)
    FROM PrevValues o
)
SELECT
  o.id,
  o.device,
  o.date_observation,
  o.device_state,
  o.reading,
  o.queue,
  o.max_date_state_change
FROM MaxValues o
WHERE o.queue = 1

db<>fiddle

CodePudding user response:

I propose you a solution built on your first part (I adpated it as I would have built the query) and a flow to calculate the latest state's change, whith the following steps:

  • add on each row the previous state
  • add an indicator of state's change
  • calculate max date for rows identified as state's change
  • join the result to yours

Here is the result I propose:

WITH prev_data AS (
  SELECT id, device, date_observation, device_state, reading,
         LAG(device_state) OVER (partition by device order by date_observation) as prev_state
    FROM observations
),
data_changes AS (
  SELECT id, device, date_observation, device_state, reading,
         CASE WHEN device_state = prev_state THEN 0 else 1 END as change_state
    FROM prev_data
),
last_change AS (
  SELECT device, max(date_observation) as date_state_change
    FROM data_changes
   WHERE change_state = 1
   GROUP BY device
),
max_date as (
    SELECT id, device, date_observation, device_state, reading,
           max(date_observation) over (partition by device) as max_date
      FROM observations
)
SELECT id, m.device, date_observation, device_state, reading, date_state_change
  FROM max_date m
  LEFT JOIN last_change l ON m.device = l.device
 WHERE date_observation = max_date

Result :

  id     device     date_observation         device_state      reading         date_state_change
2021118     A    2021-11-25 00:00:00.000          2            0.0658       2021-10-28 00:00:00.000
2021119     B    2021-11-25 00:00:00.000          4            0.1291       2021-11-18 00:00:00.000
2021120     C    2021-11-25 00:00:00.000          2            -0.1907      2021-11-25 00:00:00.000

You can check the result using dbFiddle

CodePudding user response:

You can also use LAG() to compare current row device_state and the previous row device state for each device and use MAX() to get the date_state_change.

WITH
f1 AS (
SELECT *, MAX(date_observation) OVER (PARTITION BY device) AS max_date,
       CASE WHEN device_state <> LAG(device_state) OVER (PARTITION BY device ORDER BY date_observation) 
       THEN date_observation END AS date_flag
FROM observations
),
f2 AS (
SELECT *, MAX(date_flag) OVER (PARTITION BY device) AS date_state_change
FROM f1
)
SELECT id, device, date_observation, device_state, reading, date_state_change  
FROM f2
WHERE max_date = date_observation;

See Demo

  • Related