Home > Software design >  Workout the time difference between rows for a given group and using specified column as start of ti
Workout the time difference between rows for a given group and using specified column as start of ti

Time:08-24

I have the following dataframe:

       id     key  serial    time
0  310000   first     100    2022-01-15 07:28:00
1  310000  second     100    2022-01-15 07:29:00 
3  450000   first     200    2022-01-15 07:45:00
4  450000  second     200    2022-01-15 07:46:00
5  450000   third     200    2022-01-15 07:47:00

I want to workout the time difference in seconds between rows given that:

  • They have same id and same serial
  • Start time for the group is if key=first exist. Otherwise null.

Ideal outcome:

       id     key  serial    time                  difference
0  310000   first     100    2022-01-15 07:28:00       0
1  310000  second     100    2022-01-15 07:29:00      60
3  450000   first     200    2022-01-15 07:45:00       0   
4  450000  second     200    2022-01-15 07:46:00      60
5  450000   third     200    2022-01-15 07:47:00     120      

CodePudding user response:

updated answer: difference from start of group

df['time'] = pd.to_datetime(df['time'])

df['difference'] = (df.groupby(['id', 'serial'])['time']
                    .transform(lambda g: g-g.iloc[0])
                    .dt.total_seconds()
                    .fillna(0, downcast='infer')
                    )

output:

       id     key  serial                time  difference
0  310000   first     100 2022-01-15 07:28:00           0
1  310000  second     100 2022-01-15 07:29:00          60
3  450000   first     200 2022-01-15 07:45:00           0
4  450000  second     200 2022-01-15 07:46:00          60
5  450000   third     200 2022-01-15 07:47:00         120

older answer

Use a groupby.diff:

df['time'] = pd.to_datetime(df['time'])

df['difference'] = (df.groupby(['id', 'serial'])['time']
                    .diff().fillna('0', downcast='infer')
                    )

Output:

       id     key  serial                time      difference
0  310000   first     100 2022-01-15 07:28:00 0 days 00:00:00
1  310000  second     100 2022-01-15 07:29:00 0 days 00:01:00
3  450000   first     200 2022-01-15 07:45:00 0 days 00:00:00
4  450000  second     200 2022-01-15 07:46:00 0 days 00:01:00
5  450000   third     200 2022-01-15 07:47:00 0 days 00:01:00

Seconds as integers:

df['time'] = pd.to_datetime(df['time'])

df['difference'] = (df.groupby(['id', 'serial'])['time']
                    .diff()
                    .dt.total_seconds()
                    .fillna(0, downcast='infer')
                    )

Output:

       id     key  serial                time  difference
0  310000   first     100 2022-01-15 07:28:00           0
1  310000  second     100 2022-01-15 07:29:00          60
3  450000   first     200 2022-01-15 07:45:00           0
4  450000  second     200 2022-01-15 07:46:00          60
5  450000   third     200 2022-01-15 07:47:00          60
  • Related