I would like to create a column, in this given example 'amount', within a Pandas dataframe 'df' where the value of each row is based on its previous rows as well as the value from another column 'id'. Example, if 'id' already has the value 30 assigned to it in the 'amount' column, then 0 else 30.
The expected outcome shown below:
id amount
a 30
b 30
a 0
a 0
c 30
a 0
c 0
b 0
b 0
a 0
a 0
I thought I could accomplish this through some combination of groupby and lambda, but sadly I've repeatedly hit a wall.
What I tried out was:
df['amount'] = df.apply(lambda x: 30 if df.groupby('id')['amount'].cumsum()<30 else 0)
This gives me the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I apologize in advance if the solution is obvious, but unfortunately, I haven't been able to find anything so far that would solve this.
Thanks
CodePudding user response:
You can use an alternative column as such:
import numpy as np
df1["pastcol"]=[np.nan] list(df1["amount"])[:-1]
Output:
id amount pastcol
0 a 30 NaN
1 b 30 30.0
2 a 0 30.0
3 a 0 0.0
4 c 30 0.0
5 a 0 30.0
6 c 0 0.0
7 b 0 0.0
8 b 0 0.0
9 a 0 0.0
10 a 0 0.0
CodePudding user response:
I thankfully was able to answer my own question. For anyone who is interested, I was successful with the following approach:
df['amount'] = df['amount'].where(df.groupby('id')['amount'].shift().cumsum() < 30, 30)
Thanks to everyone who shared their ideas!