Home > OS >  Calculate difference between groups
Calculate difference between groups

Time:05-31

Hi how could I calculate conditionally create a column that is based on diffrence with the next group? The dataframe is sorted

country  fruit   time   group   group_start_time
UK       apple   12:20   a      12:20
UK       apple   12:22   a      12:20
UK       banana  12:22   b      12:22 
UK       apple   12:26   c      12:26

The goal is that if the difference between the the current and the next group is under 5 minutes, they can be part of the same group. (Group B's start time is within 5 minutes of Group A. Group C's start time is within 5 minutes of Group B.)

Expected Output

country  fruit   time   group  group_start_time   new_group
UK       apple   12:20   a      12:20                 a
UK       apple   12:22   a      12:20                 a
UK       banana  12:22   b      12:22                 a
UK       apple   12:26   c      12:26                 a

CodePudding user response:

Assuming your first sort the groups by start time, you can mask the groups within 5 min of each other and apply a ffill:

m = pd.to_timedelta(df['group_start_time'] ':00').diff().le('5min')
df['new_group'] = df['group'].mask(m).ffill()

output:

  country   fruit   time group group_start_time new_group
0      UK   apple  12:20     a            12:20         a
1      UK   apple  12:22     a            12:20         a
2      UK  banana  12:22     b            12:22         a
3      UK   apple  12:26     c            12:26         a
  • Related