Python Group By and Case When Equivalent Relative to R-CodePudding

I am trying to do a case when/if-else statement in Python on a grouped dataframe to create a new variable. I would want to do the following if I was coding in R and I am trying find an operation in Python that's similar and also vectorized. R code:

dt %>% group_by(user,merchant,date) %>%
mutate(
new_variable = case_when(-amount == lag(amount) ~ 2,
                         True ~ 1)
) %>% ungroup()

In Python I have tried using np.select:

    conditions = [
    (-us_trans['real_amount'] == us_trans['real_amount'].shift(-1)),
    (-us_trans['real_amount'] != us_trans['real_amount'].shift(-1))

]

    values = [
        2, 
        1
        
    ]

But I don't know how to use np.select on a grouped dataframe to create a new variable.

I know that I can use groupby(['user','merchant','date'].apply and pass an if-else statement but I believe this would be done in a loop and I am trying to do it in a vectorized manner to optimize my code.

Thanks!

CodePudding user response：

Assuming your data is in a pandas dataframe, df, you could use datatable , along with shift(), ifelse() and by()

from datatable import dt, f, by

df = dt.Frame(df)

df[:,
   dt.update(new_variable=dt.ifelse(-1*dt.shift(f.amount)==f.amount,2,1)),
   by(f.user,f.merchant,f.date)
]