I am trying to do a case when/if-else statement in Python on a grouped dataframe to create a new variable. I would want to do the following if I was coding in R and I am trying find an operation in Python that's similar and also vectorized. R code:
dt %>% group_by(user,merchant,date) %>%
mutate(
new_variable = case_when(-amount == lag(amount) ~ 2,
True ~ 1)
) %>% ungroup()
In Python I have tried using np.select:
conditions = [
(-us_trans['real_amount'] == us_trans['real_amount'].shift(-1)),
(-us_trans['real_amount'] != us_trans['real_amount'].shift(-1))
]
values = [
2,
1
]
But I don't know how to use np.select on a grouped dataframe to create a new variable.
I know that I can use groupby(['user','merchant','date'].apply
and pass an if-else statement but I believe this would be done in a loop and I am trying to do it in a vectorized manner to optimize my code.
Thanks!
CodePudding user response:
Assuming your data is in a pandas dataframe, df
, you could use datatable
, along with shift()
, ifelse()
and by()
from datatable import dt, f, by
df = dt.Frame(df)
df[:,
dt.update(new_variable=dt.ifelse(-1*dt.shift(f.amount)==f.amount,2,1)),
by(f.user,f.merchant,f.date)
]