Assume I have the following dataframe
A B C
5 1 0.7
7 1 0.7
-1 1 0.7
-3 1 0.7
12 1 0.7
I'd like to multiply A and B or A and C based on a the cumulative sum of previous multiplications.
First iteration, the cum sum is 0 so we shall multiply A and B thus get
A B C cum sum
5 1 0.7 5
7 1 0.7
-1 1 0.7
-3 1 0.7
12 1 0.7
cum sum is less than 10 so we shall again multiply A and B, thus get
A B C cum sum
5 1 0.7 5
7 1 0.7 12
-1 1 0.7
-3 1 0.7
12 1 0.7
Now the cum sum is larger than 10 and thus we shall multiply A and C and thus get
A B C cum sum
5 1 0.7 5
7 1 0.7 12
-1 1 0.7 11.3
-3 1 0.7
12 1 0.7
Continuing this the expected output would be
A B C cum sum
5 1 0.7 5
7 1 0.7 12
-1 1 0.7 11.3
-3 1 0.7 9.2
12 1 0.7 21.2
a = {'A':[5, 7, -1, -3, 12]}
b = {'B': [1, 1, 1, 1, 1]}
c = {'C' : [0.7, 0.7, 0.7, 0.7, 0.7]}
a.update(b)
a.update(c)
df = pd.DataFrame.from_dict(a)
How can I achieve this?
CodePudding user response:
A for
loop would work
cs = [0]
for _, row in df.iterrows():
if cs[-1] > 10:
curr = cs[-1] row.A * row.C
else:
curr = cs[-1] row.A * row.B
cs = [curr]
pandas.Series(cs[1:])
# 0 5.0
# 1 12.0
# 2 11.3
# 3 9.2
# 4 21.2
# dtype: float64
CodePudding user response:
Another way using a for
loop but instead iterating over a list of values of A in a custom function:
def filtered_cumsum(a, factor=0.7):
result = [a[0]]
for num in a[1:]:
result.append(result[-1] num*(1 if result[-1]<10 else factor))
return result
df["cumsum"] = filtered_cumsum(df["A"].tolist())
>>> df
A cumsum
0 5 5.0
1 7 12.0
2 -1 11.3
3 -3 9.2
4 12 21.2
5 -78 -33.4
6 -98 -131.4
7 91 -40.4
8 32 -8.4
9 -7 -15.4
10 37 21.6
11 70 70.6