df
person time_bought product
42 abby 10min fruit
12 abby 5min fruit
10 abby 10min other
3 barry 12min fruit
...
How could I convert the lines below into a a generalisable function, since I'm using groupby all the time.
ref = df.groupby('person')['time_bought'].shift()
m1 = df.loc[df.product=="fruit", 'time_bought'].groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')
df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2) # gives a column
I tried below. But it doesn't work and in other attempts doesn't give a column. I read that you should not use actual column names in function:
def gen_new_group(df, col, interested_in, var):
ref = df[var].shift()
m1 = df.loc[df.col==interested_in, var].diff().gt(ref)
m2 = df[col].ne(interested_in)
df['new_group'] = m1.cumsum().add(1).mask(m2)
return df.new_group
df['new_group'] = df.groupby('person').apply(gen_new_group, col='product', interested_in="fruit", var="time_bought)
CodePudding user response:
Use x[col]
instead x.col
and return
group x
instead one column:
def gen_new_group(x, col, interested_in, var):
ref = x[var].shift()
m1 = x.loc[x[col]==interested_in, var].diff().gt(ref)
m2 = x[col].ne(interested_in)
x['new_group'] = m1.cumsum().add(1).mask(m2)
return x
df = (df.groupby('person')
.apply(gen_new_group, col='product', interested_in="fruit", var="time_bought"))
print (df)
person time_bought product new_group
42 abby 10 fruit 1.0
12 abby 5 fruit 1.0
10 abby 10 other NaN
3 barry 12 fruit 1.0