How to get latest row value if duplicate entries are there-CodePudding

I've data like below:

ID   Task          Time
1    X started     T1
2    X ended       T2
3    X started     T3    [wrong entry in data]
4    X started     T4
5    X ended       T5 
6    Y started     T6    [wrong entry in data]
7    Y started     T7
8    Y ended       T8

And, I need to get the data from above in started/ended fashion, but in case of wrong entry I need to pickup the latest one [as T4>T3 and T7>T6]. How can I write SQL on above dataset to get below result ?

ID   Task          Time
1    X started     T1
2    X ended       T2
4    X started     T4
5    X ended       T5 
7    Y started     T7
8    Y ended       T8

CodePudding user response：

use a subquery in the where to match the ID, the subquery returns the ID of the row that matches the task (in the primary query), if there is more than one row with the same task value, it will return the row with the greatest id value.

SELECT * FROM `mytable` p1 
where id = (select id from `mytable` p2 
               where p2.task= p1.task
                order by id DESC limit 1);

CodePudding user response：

You may use the difference of two row_number()s to define unique groups for the consecutive duplicate values of Task column, then get the max of Time and ID columns, try the following:

select max(ID), Task, max(Time) Time
from
(
  select *,
    row_number() over (order by Time) -
    row_number() over (partition by Task order by Time) grp
  from table_name
) T
group by Task, grp
order by max(Time)

See a demo.

For MySQL 5.7 try the following:

set @gr=0;
set @ts=null;

select max(ID), Task, max(Time) 
Time
from
(
  select *,
    if(@ts<>Task, @gr:=@gr 1, 
@gr) grp,
    @ts:=Task
  from table_name
  order by Time
) T
group by Task, grp
order by max(Time)

Demo.