I am trying to figure out how to get rid of results that occur close together. For example the rows have a create timestamp (source_time). I want to remove results that occur within 10 seconds of each other.
I thought lag() might do it, but I can't use that in the where clause.
select *
from table
where source_time - previous(source_time) >= 10 second
Very rough code, but I am not sure how to call the previous source time. I have translated them to timestamps and used timestamp_diff(source_time, x, second) >= 10 but not sure how to make x the previous value.
Hopefully this is clear.
CodePudding user response:
You can do this with subqueries.
delete table t1
where t1.id in (
select t2.id
from (
select
id,
source_time - lag(source_time) over (order by source_time) as time_diff
from table
) t2
where t2.time_diff < 10 second
)
Keep in mind this can potentially leave large gaps in your records if. For example, if you get a row every 9 seconds for an hour you'll delete all but the last record in that hour.
You might instead partition the source_time every 10 seconds and delete anything with a row_number > 1.
delete table t1
where t1.id in (
select t2.id
from (
select
id,
source_time,
row_number() over(
partition by source_time - make_interval(second => extract(second from source_time) % 10)
order by source_time asc
) rownum
from table
) t2
where rownum > 1
)