Home > Back-end >  Mask using groupby apply function
Mask using groupby apply function

Time:07-09

df
   person   time_bought  product    
42 abby     10min        fruit        
12 abby     5min         fruit      
10 abby     10min        other
3  barry    12min        fruit      
...

How could I convert the lines below into a a generalisable function, since I'm using groupby all the time.

ref = df.groupby('person')['time_bought'].shift()
m1 = df.loc[df.product=="fruit", 'time_bought'].groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')
df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2) # gives a column

I tried below. But it doesn't work and in other attempts doesn't give a column. I read that you should not use actual column names in function:

def gen_new_group(df, col, interested_in, var):
    ref = df[var].shift()
    m1 = df.loc[df.col==interested_in, var].diff().gt(ref)
    m2 = df[col].ne(interested_in)
    df['new_group'] = m1.cumsum().add(1).mask(m2)
    return df.new_group

df['new_group'] = df.groupby('person').apply(gen_new_group, col='product', interested_in="fruit", var="time_bought)

CodePudding user response:

Use x[col] instead x.col and return group x instead one column:

def gen_new_group(x, col, interested_in, var):
    ref = x[var].shift()
    m1 = x.loc[x[col]==interested_in, var].diff().gt(ref)
    m2 = x[col].ne(interested_in)
    x['new_group'] = m1.cumsum().add(1).mask(m2)
    return x

df = (df.groupby('person')
        .apply(gen_new_group, col='product', interested_in="fruit", var="time_bought"))
    
print (df)
   person  time_bought product  new_group
42   abby           10   fruit        1.0
12   abby            5   fruit        1.0
10   abby           10   other        NaN
3   barry           12   fruit        1.0
  • Related