Home > Back-end >  How to delete duplicates data that is in between two common value?
How to delete duplicates data that is in between two common value?

Time:12-21

How can I delete duplicate data based on the common value (Start and End)

(Time is unique key)

My table is:

Time Data
10:24:11 Start
10:24:12 Result
10:24:13 Result
10:24:14 End
10:24:15 Start
10:24:16 Result
10:24:17 End

I want to get Data: Result in between Start and End that is with the MAX(TIME) when duplication does occur. as such

The result that I want:

Time Data
10:24:11 Start
10:24:13 Result
10:24:14 End
10:24:15 Start
10:24:16 Result
10:24:17 End

I have tried rearranging the data, but couldn't seems to get the result that I want, Could someone give their advice on this case?

CodePudding user response:

Hi a first approach will be to use a lead function as folow

 select hour,status from (select *,lead(status,1) over ( order by hour) as lead
from newtable)compare 
where compare.lead <> status
OR lead is null

Give me what's expected using a postgres engine.

CodePudding user response:

You can do this sort of thing with SQL procedures.

-- create the table with only two columns
CREATE TABLE actions (attime TIME UNIQUE, data VARCHAR(10));

-- drop the procedure if it exists
DROP PROCEDURE del_duplicates IF EXISTS;

create procedure del_duplicates()  MODIFIES SQL DATA begin atomic
DECLARE last_time time(0) default null;
 for_loop:
 -- loop over the rows in order
 FOR SELECT * FROM actions ORDER BY attime DO

  -- each time 'Start' is found, clear the last_time variable
  IF data = 'Start' THEN
   SET last_time = NULL; 
   ITERATE for_loop;
  END IF;

  -- each time 'Result' is found, delete the row with previous time
  -- if last_time is null, no row is actually deleted 
  IF data = 'Result' THEN
   DELETE FROM actions WHERE attime = last_time;
   -- then store the latest time
   SET last_time = attime; 
   ITERATE for_loop;
  END IF;
        
END FOR;        
END

Your data must all belong to a single day, otherwise there will be strange overlaps that cannot be distinguished. It is better to use TIMESTAMP instead of TIME.

  • Related