I have a table of ships, which consists of:
- row id (number)
- ship id (character varying)
- timestamp (timestamp in yyyy-mm-dd hh:mm:ss format)
Timestamp is the time that the specific ship (ship id) emitted a signal during its course. The table looks like this:
What I need to do (in PostgreSQL - pgAdmin) is for every ship_id, find if a signal has been emitted 5 seconds or less after another signal from the same ship, and then delete the row with the latter.
In the example table shown above, for the ship "foo" the signals are almost 9 minutes apart so it's all good, but for the ship "bar" the signal with row_id 4 was emitted 3 seconds after the previous one with row_id 3, so it needs to go.
Thanks a lot in advance.
CodePudding user response:
Windowing functions Lag/Lead in this case will do the trick.
Add a LAG to calculate the difference between timestamps for the same ships. This will allow you to calculate the time difference for the same ship and its most recent posting.
Use that to filter out what to delete
SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF
--THEN SOMETHING LIKE THIS TO FIND WHICH ROWS TO DELETE
DELETE FROM SHIP_TABLE WHERE ROW_ID IN
(SELECT ROW_ID FROM
(SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF) SUB_1
WHERE SECONDS_DIFF <= 10 --THRESHOLD
) SUB_2