In a table I have time-series data that looks like the following:
t_stamp | status_val |
---|---|
2022-04-21 8:00 AM | 0 |
2022-04-21 8:01 AM | 0 |
2022-04-21 8:02 AM | 3 |
2022-04-21 8:03 AM | 1 |
2022-04-21 8:04 AM | 1 |
2022-04-21 8:05 AM | 0 |
The example above is very simplified. In actuality the timestamps are every second and the status_val may be the same for 100s of rows before seeing a different value.
Trying to write a query that selects the first row & status_val and then the next row is when the status_val actually changes. So the output should be this:
t_stamp | status_val |
---|---|
2022-04-21 8:00 AM | 0 |
2022-04-21 8:02 AM | 3 |
2022-04-21 8:03 AM | 1 |
2022-04-21 8:05 AM | 0 |
Almost like using a LEAD(status_val,1) <> status_val
in the WHERE clause...but that obviously doesn't work in WHERE clauses.
I've tried using group by and distinct but the results were not the desired output. Not an expert with postgres so please be gentle :)
CodePudding user response:
Use LAG()
to determine which rows to keep from inside a CTE:
with mark_repeats as (
select t_stamp, status_val,
coalesce(
status_val = lag(status_val) over (order by t_stamp),
false
) as is_repeat
from your_table
)
select t_stamp, status_val
from mark_repeats
where not is_repeat;