Given the data below, I would like to:
- [x] Partition data by device
- [x] Report back details of the latest reading
- [ ] Add a new field capturing the date of the latest device state change
id | device | date_observation | device_state | reading |
---|---|---|---|---|
2021100 | A | 2021-10-14 00:00:00.000 | 1 | -0.3623 |
2021101 | B | 2021-10-14 00:00:00.000 | 1 | -0.0004 |
2021102 | C | 2021-10-14 00:00:00.000 | 1 | 0.1333 |
2021103 | A | 2021-10-21 00:00:00.000 | 1 | 0.1907 |
2021104 | B | 2021-10-21 00:00:00.000 | 2 | -0.3548 |
2021105 | C | 2021-10-21 00:00:00.000 | 1 | -0.2275 |
2021106 | A | 2021-10-28 00:00:00.000 | 2 | -0.2404 |
2021107 | B | 2021-10-28 00:00:00.000 | 2 | -0.0244 |
2021108 | C | 2021-10-28 00:00:00.000 | 1 | -0.3441 |
2021109 | A | 2021-11-04 00:00:00.000 | 2 | 0.2843 |
2021110 | B | 2021-11-04 00:00:00.000 | 3 | 0.2376 |
2021111 | C | 2021-11-04 00:00:00.000 | 1 | -0.0247 |
2021112 | A | 2021-11-11 00:00:00.000 | 2 | 0.1814 |
2021113 | B | 2021-11-11 00:00:00.000 | 3 | -0.2816 |
2021114 | C | 2021-11-11 00:00:00.000 | 1 | -0.1493 |
2021115 | A | 2021-11-18 00:00:00.000 | 2 | 0.1694 |
2021116 | B | 2021-11-18 00:00:00.000 | 4 | 0.0864 |
2021117 | C | 2021-11-18 00:00:00.000 | 1 | 0.0395 |
2021118 | A | 2021-11-25 00:00:00.000 | 2 | 0.0658 |
2021119 | B | 2021-11-25 00:00:00.000 | 4 | 0.1291 |
2021120 | C | 2021-11-25 00:00:00.000 | 2 | -0.1907 |
Using the code below, I am able partition data by device and to grab the latest record for each device.
WITH o AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION by device
ORDER BY date_observation DESC) AS queue
FROM observations
)
SELECT *
FROM o
WHERE queue = 1
Results
id | device | date_observation | device_state | reading | queue |
---|---|---|---|---|---|
2021118 | A | 2021-11-25 00:00:00.000 | 2 | 0.0658 | 1 |
2021119 | B | 2021-11-25 00:00:00.000 | 4 | 0.1291 | 1 |
2021120 | C | 2021-11-25 00:00:00.000 | 2 | -0.1907 | 1 |
What is the best way to get the date of the latest state change?
For example the latest device state changed for device A occurred on 2021-10-28.
The expected output.
id | device | date_observation | device_state | reading | queue | date_state_change |
---|---|---|---|---|---|---|
2021118 | A | 2021-11-25 00:00:00.000 | 2 | 0.0658 | 1 | 2021-10-28 |
2021119 | B | 2021-11-25 00:00:00.000 | 4 | 0.1291 | 1 | 2021-11-18 |
2021120 | C | 2021-11-25 00:00:00.000 | 2 | -0.1907 | 1 | 2021-11-25 |
Thoughts? Thanks.
CodePudding user response:
You can use more window functions
- Use
LEAD
to identify the change-over rows - Use
MAX
to get the latest change-over date
WITH PrevValues AS (
SELECT *,
queue = ROW_NUMBER() OVER (PARTITION BY device ORDER BY date_observation DESC),
date_state_change = CASE WHEN device_state <>
LEAD(device_state, 1, -1) OVER (PARTITION BY device ORDER BY date_observation DESC)
THEN date_observation END
FROM observations o
),
MaxValues AS (
SELECT *,
max_date_state_change = MAX(date_state_change) OVER (PARTITION BY device)
FROM PrevValues o
)
SELECT
o.id,
o.device,
o.date_observation,
o.device_state,
o.reading,
o.queue,
o.max_date_state_change
FROM MaxValues o
WHERE o.queue = 1
CodePudding user response:
I propose you a solution built on your first part (I adpated it as I would have built the query) and a flow to calculate the latest state's change, whith the following steps:
- add on each row the previous state
- add an indicator of state's change
- calculate max date for rows identified as state's change
- join the result to yours
Here is the result I propose:
WITH prev_data AS (
SELECT id, device, date_observation, device_state, reading,
LAG(device_state) OVER (partition by device order by date_observation) as prev_state
FROM observations
),
data_changes AS (
SELECT id, device, date_observation, device_state, reading,
CASE WHEN device_state = prev_state THEN 0 else 1 END as change_state
FROM prev_data
),
last_change AS (
SELECT device, max(date_observation) as date_state_change
FROM data_changes
WHERE change_state = 1
GROUP BY device
),
max_date as (
SELECT id, device, date_observation, device_state, reading,
max(date_observation) over (partition by device) as max_date
FROM observations
)
SELECT id, m.device, date_observation, device_state, reading, date_state_change
FROM max_date m
LEFT JOIN last_change l ON m.device = l.device
WHERE date_observation = max_date
Result :
id device date_observation device_state reading date_state_change
2021118 A 2021-11-25 00:00:00.000 2 0.0658 2021-10-28 00:00:00.000
2021119 B 2021-11-25 00:00:00.000 4 0.1291 2021-11-18 00:00:00.000
2021120 C 2021-11-25 00:00:00.000 2 -0.1907 2021-11-25 00:00:00.000
You can check the result using dbFiddle
CodePudding user response:
You can also use LAG() to compare current row device_state and the previous row device state for each device and use MAX() to get the date_state_change.
WITH
f1 AS (
SELECT *, MAX(date_observation) OVER (PARTITION BY device) AS max_date,
CASE WHEN device_state <> LAG(device_state) OVER (PARTITION BY device ORDER BY date_observation)
THEN date_observation END AS date_flag
FROM observations
),
f2 AS (
SELECT *, MAX(date_flag) OVER (PARTITION BY device) AS date_state_change
FROM f1
)
SELECT id, device, date_observation, device_state, reading, date_state_change
FROM f2
WHERE max_date = date_observation;
See Demo