Home > Enterprise >  pick first two events for each id. One should go after another
pick first two events for each id. One should go after another

Time:07-21

I have a dataset that looks like this

id event time
1  open  2022-07-05
1  close 2021-05-05
2  open  2022-05-05
3  open  2019-07-12
1  close 2022-06-05
3  open 2018-07-12
3  close 2018-08-12
2  close 2023-05-05

I want to find first occurrence for each event. It is important that close goes after open

id event time
1  open  2022-07-05
1  close 2021-05-05
2  open  2022-05-05
3  open 2018-07-12
3  close 2018-08-12
2  close 2023-05-05

CodePudding user response:

Update

It is important that close goes after open

I slightly modify your dataframe:

   id  event        time
0   1   open  2022-07-05
1   1  close  2021-05-05
2   2  close  2022-04-04  # close event occurs before open event
3   2   open  2022-05-05
4   3   open  2019-07-12
5   1  close  2022-06-05
6   3   open  2018-07-12
7   3  close  2018-08-12
8   2  close  2023-05-05

You can use:

keep_first = lambda x: x[x['event'].eq('open').cumsum().gt(0)].drop_duplicates(['id', 'event'])
out = (df.sort_values(['event', 'time'], ascending=[False, True])
         .groupby('id').apply(keep_first).droplevel(0))
print(out)

# Output
   id  event       time
0   1   open 2022-07-05
1   1  close 2021-05-05
3   2   open 2022-05-05
2   2  close 2022-04-04
6   3   open 2018-07-12
7   3  close 2018-08-12

CodePudding user response:

First sorting by id and time and extract open-close pairs per id:

df['time'] = pd.to_datetime(df['time'])
df = df.sort_values(['id','time'], ascending=[True, False])

m1 = df['event'].eq('open') & df.groupby('id')['event'].shift(-1).eq('close')
m2 = df['event'].eq('close') & df.groupby('id')['event'].shift().eq('open')

df2 = df[m1 | m2]

Then if multiple pairs per id remove duplicates:

df = df.drop_duplicates(['id', 'event'])
  • Related