Home > Enterprise >  pandas get time delta between two conditions per group
pandas get time delta between two conditions per group

Time:11-09

I have a dataframe:

> df = batch Code.  time 
>       a     100.  2019-08-01 00:59:12.000
>       a     120.  2019-08-01 00:59:32.000
>       a     130.  2019-08-01 00:59:42.000
>       a     120.  2019-08-01 00:59:52.000
>       b     100.  2019-08-01 00:44:11.000
>       b     140.  2019-08-02 00:14:11.000
>       b     150.  2019-08-03 00:47:11.000
>       c     150.  2019-09-01 00:44:11.000
>       d     100.  2019-08-01 00:10:00.000
>       d     100.  2019-08-01 00:10:05.000
>       d     130.  2019-08-01 00:10:10.000
>       d     130.  2019-08-01 00:10:20.000

I want to get the number of seconds, per group, between the time of the first '100' code to the last '130' code. If for a group there is no code 100 with code 130 after (one of them is missing) - put nan. So the output should be:

df2 = batch duration
        a      30
        b.     nan
        c.     nan
        d.     20

What is the best way to do it?

CodePudding user response:

Use:

#convert values to datetimes
df['time'] = pd.to_datetime(df['time'])
   
#get first 100 Code per batch 
s1=df[df['Code.'].eq(100)].drop_duplicates('batch').set_index('batch')['time']
#get last 130 Code per batch 
s2=df[df['Code.'].eq(130)].drop_duplicates('batch', keep='last').set_index('batch')['time']

#subtract and convert to timedeltas
df = (s2.sub(s1)
        .dt.total_seconds()
        .reindex(df['batch'].unique())
        .reset_index(name='duration'))
print (df)
  batch  duration
0     a      30.0
1     b       NaN
2     c       NaN
3     d      20.0

CodePudding user response:

As an alternative:

batchs = pd.DataFrame(df['batch'].unique(),columns=['batch'])

df = df[(df['code'] == 100) | (df['code']==130)]
final=pd.concat([
    df.drop_duplicates(subset='code',keep='first'),
    df.drop_duplicates(subset='code',keep='last'),
])
final['duration'] = (final['time'].shift(-1) - final['time']).dt.total_seconds()

final = final.drop_duplicates('batch',keep='first').drop(['time','code'],axis=1).merge(batchs,on='batch',how='right')
final
    batch   duration
0   a       30.0
1   b       nan
2   c       nan
3   d       15.0

  • Related