df:
Id activity sequence timestamp
1 start 1 2020-06-12 09:51:42
1 end 2 2020-06-12 09:51:42
1 start 1 2020-06-12 09:58:52
1 end 2 2020-06-12 10:12:22
I wanted to drop the middle part of the repeating process and only get the the first and last timestamp.
this is the output i hoped for:
df:
Id activity sequence timestamp
1 start 1 2020-06-12 09:51:42
1 end 2 2020-06-12 09:58:52
Thanks in advance
CodePudding user response:
Try .loc
update:
df.loc[df['activity']=='end','timestamp'] = df.loc[df['activity']=='start', 'timestamp'].values
CodePudding user response:
I think you are trying to group by Id, activity, sequence et get minimum date or maximum, and not drop duplicates
Go split your df in start, and end, and group by to get minimum date for start et get maximum for end, and then concat results