Home > Net >  Difference between pandas groups by condition
Difference between pandas groups by condition

Time:06-13

how could I create the new_group column? It's based on 10-minute fruit gaps if the row above is fruit; and 2-minute fruit gaps if the row above is Other? Dataframe is sorted.

person   time_bought  product    new_group
abby     2:21         fruit        1
abby     2:25         fruit        1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit        2  
abby     10:40        other
abby     10:42        fruit        2  (10.42 is within 2 minutes of 10.35)
abby     10:53        fruit        3  (10.53 is not within 10 minutes of 10.42)
barry    12:00        fruit        1
...

I tried

m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)

CodePudding user response:

IIUC, you can use a dictionary to hold the references, then use a variation of the same code:

thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)

# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')

df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)

output:

  person time_bought product  new_group
0   abby        2:21   fruit        1.0
1   abby        2:25   fruit        1.0
2   abby       10:35   fruit        2.0
3   abby       10:40   other        NaN
4   abby       10:42   fruit        2.0
5   abby       10:53   fruit        3.0
6  barry       12:00   fruit        1.0
  • Related