How can I calculate sum by overlapping time intervals by grouping each name?-CodePudding

How can I calculate sum by overlapping time intervals by grouping each name. Basically the smaller interval should be merged with larger interval if that group name.

input

df1 = (pd.DataFrame({'name': ['a', 'a', 'a', 'b', 'b'],
              'time_start': ['2000-01-01 00:01:12',
                            '2000-01-01 00:01:14',
                            '2000-01-01 00:03:12',
                            '2000-01-01 00:05:12',
                            '2000-01-01 00:05:16'],
              'time_end': ['2000-01-01 00:01:18',
                            '2000-01-01 00:01:16',
                            '2000-01-01 00:03:24',
                            '2000-01-01 00:05:40',
                            '2000-01-01 00:05:18'],
                    'values':[20,30,40,20,5]})
 .assign(time_start = lambda x: pd.to_datetime(x['time_start']),
        time_end = lambda x: pd.to_datetime(x['time_end'])))

output should be

name    time_start  time_end    values
0   a   2000-01-01 00:01:12 2000-01-01 00:01:18 50
1   a   2000-01-01 00:03:12 2000-01-01 00:03:24 40
2   b   2000-01-01 00:05:12 2000-01-01 00:05:40 25

CodePudding user response：

You can use a groupby.shift then groupby.agg:

df1[['time_start', 'time_end']] = df1[['time_start', 'time_end']].apply(pd.to_datetime)

g = (~df1['time_start'].lt(df1.groupby('name')['time_end'].shift())).cumsum()

out = (df1.groupby(['name', g], as_index=False)
          .agg({'time_start': 'min',
                'time_end': 'max',
                'values': 'sum'})
      )

Output:

  name          time_start            time_end  values
0    a 2000-01-01 00:01:12 2000-01-01 00:01:18      50
1    a 2000-01-01 00:03:12 2000-01-01 00:03:24      40
2    b 2000-01-01 00:05:12 2000-01-01 00:05:40      25