Hi how could I calculate conditionally create a column that is based on diffrence with the next group? The dataframe is sorted
country fruit time group group_start_time
UK apple 12:20 a 12:20
UK apple 12:22 a 12:20
UK banana 12:22 b 12:22
UK apple 12:26 c 12:26
The goal is that if the difference between the the current and the next group is under 5 minutes, they can be part of the same group. (Group B's start time is within 5 minutes of Group A. Group C's start time is within 5 minutes of Group B.)
Expected Output
country fruit time group group_start_time new_group
UK apple 12:20 a 12:20 a
UK apple 12:22 a 12:20 a
UK banana 12:22 b 12:22 a
UK apple 12:26 c 12:26 a
CodePudding user response:
Assuming your first sort the groups by start time, you can mask the groups within 5 min of each other and apply a ffill
:
m = pd.to_timedelta(df['group_start_time'] ':00').diff().le('5min')
df['new_group'] = df['group'].mask(m).ffill()
output:
country fruit time group group_start_time new_group
0 UK apple 12:20 a 12:20 a
1 UK apple 12:22 a 12:20 a
2 UK banana 12:22 b 12:22 a
3 UK apple 12:26 c 12:26 a