How can I delete duplicate data based on the common value (Start and End)
(Time is unique key)
My table is:
Time | Data |
---|---|
10:24:11 | Start |
10:24:12 | Result |
10:24:13 | Result |
10:24:14 | End |
10:24:15 | Start |
10:24:16 | Result |
10:24:17 | End |
I want to get Data: Result in between Start and End that is with the MAX(TIME) when duplication does occur. as such
The result that I want:
Time | Data |
---|---|
10:24:11 | Start |
10:24:13 | Result |
10:24:14 | End |
10:24:15 | Start |
10:24:16 | Result |
10:24:17 | End |
I have tried rearranging the data, but couldn't seems to get the result that I want, Could someone give their advice on this case?
CodePudding user response:
Hi a first approach will be to use a lead function as folow
select hour,status from (select *,lead(status,1) over ( order by hour) as lead
from newtable)compare
where compare.lead <> status
OR lead is null
Give me what's expected using a postgres engine.
CodePudding user response:
You can do this sort of thing with SQL procedures.
-- create the table with only two columns
CREATE TABLE actions (attime TIME UNIQUE, data VARCHAR(10));
-- drop the procedure if it exists
DROP PROCEDURE del_duplicates IF EXISTS;
create procedure del_duplicates() MODIFIES SQL DATA begin atomic
DECLARE last_time time(0) default null;
for_loop:
-- loop over the rows in order
FOR SELECT * FROM actions ORDER BY attime DO
-- each time 'Start' is found, clear the last_time variable
IF data = 'Start' THEN
SET last_time = NULL;
ITERATE for_loop;
END IF;
-- each time 'Result' is found, delete the row with previous time
-- if last_time is null, no row is actually deleted
IF data = 'Result' THEN
DELETE FROM actions WHERE attime = last_time;
-- then store the latest time
SET last_time = attime;
ITERATE for_loop;
END IF;
END FOR;
END
Your data must all belong to a single day, otherwise there will be strange overlaps that cannot be distinguished. It is better to use TIMESTAMP instead of TIME.