Pandas: Group by and conditional sum based on value of current row-CodePudding

My dataframe looks like this:

customer_nr	order_value	year_ordered	payment_successful
1	50	1980	1
1	75	2017	0
1	10	2020	1
2	55	2000	1
2	300	2007	1
2	15	2010	0

I want to know the total amount a customer has successfully paid in the years before, for a specific order.

The expected output is as follows:

customer_nr	order_value	year_ordered	payment_successful	total_successfully_previously_paid
1	50	1980	1	0
1	75	2017	0	50
1	10	2020	1	50
2	55	2000	1	0
2	300	2007	1	55
2	15	2010	0	355

Closest i've gotten is this:

df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()

That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.

Any help is appreciated!

CodePudding user response：

Try:

df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
                                                                    .groupby(df["customer_nr"])
                                                                    .transform(lambda x: x.cumsum().shift().fillna(0))
                                            )

>>> df
   customer_nr  ...  total_successfully_previously_paid
0            1  ...                                 0.0
1            1  ...                                50.0
2            1  ...                                50.0
3            2  ...                                 0.0
4            2  ...                                55.0
5            2  ...                               355.0

[6 rows x 5 columns]