Home > Mobile >  Pandas: Group by and conditional sum based on value of current row
Pandas: Group by and conditional sum based on value of current row

Time:02-24

My dataframe looks like this:

customer_nr order_value year_ordered payment_successful
1 50 1980 1
1 75 2017 0
1 10 2020 1
2 55 2000 1
2 300 2007 1
2 15 2010 0

I want to know the total amount a customer has successfully paid in the years before, for a specific order.

The expected output is as follows:

customer_nr order_value year_ordered payment_successful total_successfully_previously_paid
1 50 1980 1 0
1 75 2017 0 50
1 10 2020 1 50
2 55 2000 1 0
2 300 2007 1 55
2 15 2010 0 355

Closest i've gotten is this:

df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()

That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.

Any help is appreciated!

CodePudding user response:

Try:

df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
                                                                    .groupby(df["customer_nr"])
                                                                    .transform(lambda x: x.cumsum().shift().fillna(0))
                                            )

>>> df
   customer_nr  ...  total_successfully_previously_paid
0            1  ...                                 0.0
1            1  ...                                50.0
2            1  ...                                50.0
3            2  ...                                 0.0
4            2  ...                                55.0
5            2  ...                               355.0

[6 rows x 5 columns]
  • Related