My dataframe looks like this:
customer_nr | order_value | year_ordered | payment_successful |
---|---|---|---|
1 | 50 | 1980 | 1 |
1 | 75 | 2017 | 0 |
1 | 10 | 2020 | 1 |
2 | 55 | 2000 | 1 |
2 | 300 | 2007 | 1 |
2 | 15 | 2010 | 0 |
I want to know the total amount a customer has successfully paid in the years before, for a specific order.
The expected output is as follows:
customer_nr | order_value | year_ordered | payment_successful | total_successfully_previously_paid |
---|---|---|---|---|
1 | 50 | 1980 | 1 | 0 |
1 | 75 | 2017 | 0 | 50 |
1 | 10 | 2020 | 1 | 50 |
2 | 55 | 2000 | 1 | 0 |
2 | 300 | 2007 | 1 | 55 |
2 | 15 | 2010 | 0 | 355 |
Closest i've gotten is this:
df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()
That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.
Any help is appreciated!
CodePudding user response:
Try:
df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
.groupby(df["customer_nr"])
.transform(lambda x: x.cumsum().shift().fillna(0))
)
>>> df
customer_nr ... total_successfully_previously_paid
0 1 ... 0.0
1 1 ... 50.0
2 1 ... 50.0
3 2 ... 0.0
4 2 ... 55.0
5 2 ... 355.0
[6 rows x 5 columns]