how could I create the new_group column? It's based on 10-minute fruit gaps if the row above is fruit; and 2-minute fruit gaps if the row above is Other? Dataframe is sorted.
person time_bought product new_group
abby 2:21 fruit 1
abby 2:25 fruit 1 (2.25 is within 10 minutes of 2.21 so part of same group)
abby 10:35 fruit 2
abby 10:40 other
abby 10:42 fruit 2 (10.42 is within 2 minutes of 10.35)
abby 10:53 fruit 3 (10.53 is not within 10 minutes of 10.42)
barry 12:00 fruit 1
...
I tried
m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)
CodePudding user response:
IIUC, you can use a dictionary to hold the references, then use a variation of the same code:
thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)
# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')
df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)
output:
person time_bought product new_group
0 abby 2:21 fruit 1.0
1 abby 2:25 fruit 1.0
2 abby 10:35 fruit 2.0
3 abby 10:40 other NaN
4 abby 10:42 fruit 2.0
5 abby 10:53 fruit 3.0
6 barry 12:00 fruit 1.0