I need to modify two columns in a dataframe by two different calculations that requires the third column for getting the second, and the second column for getting the third, at the second ROW the program should use the new calculated values for calculating the third row and so on (so "sequentially"). This without for loops. I explain:
d = {'A':[1,2,3,4,5], 'B':[2,2,2,2,2], 'C':[3,3,3,3,3]}
data = pd.DataFrame(data=d)
data:
A B C
0 1 2 3
1 2 2 3
2 3 2 3
3 4 2 3
4 5 2 3
...
Then the calculations are:
data['B'] = data['C'] / data['A']
data['C'] = data['A'] * data['B'] data['A']
And it results in:
data: when instead i'm tryin to get:
A B C A B C
0 1 3.00 4.0 0 1 3.00 4.0
1 2 1.50 5.0 1 2 2.00 6.0
2 3 1.00 6.0 2 3 2.00 9.0
3 4 0.75 7.0 3 4 2.25 13
4 5 0.60 8.0 4 5 2.60 18
The problem is clearly that ['B'] calculate all of his values before ['C'] and not row per row.
How can I achieve this without loops?
CodePudding user response:
Looks like the column B doesn't do much
data['C'] = data['A'].cumsum()
CodePudding user response:
The example you have provided is a trivial one and solvable for example by applying an ordinary function with global variable remembering the previous row values to calculate values of the current row or by extreme simplifying it like it was for fun proposed in another answer. I suppose that you are after something more sophisticated as just using values from the previous row to calculate the new one as this can be simply done without using recursion.
The code I provide below can with slight modifications like for example B[level:]
instead of B[level]
be used in case each row needs actually as many calculations as there are rows preceding it:
def rC(A, B, C, level=-1, max_level=None):
level = 1
if not level:
max_level = len(A)-1
B[level] = C[level] / A[level]
C[level] = A[level] * B[level] A[level]
else:
B[level] = C[level-1] / A[level]
C[level] = A[level] * B[level] A[level]
if level == max_level:
return (B, C)
return rC(A,B,C,level, max_level)
data['B'], data['C'] = rC( data['A'], data['B'], data['C'] )
print( data )
prints:
A B C
0 1 3.00 4
1 2 2.00 6
2 3 2.00 9
3 4 2.25 13
4 5 2.60 18
Full code for check:
import pandas as pd
d = {'A':[1,2,3,4,5], 'B':[2,2,2,2,2], 'C':[3,3,3,3,3]}
data = pd.DataFrame(data=d)
def rC(A, B, C, level=-1, max_level=None):
level = 1
if not level:
max_level = len(A)-1
B[level] = C[level] / A[level]
C[level] = A[level] * B[level] A[level]
else:
B[level] = C[level-1] / A[level]
C[level] = A[level] * B[level] A[level]
if level == max_level:
return (B, C)
return rC(A,B,C,level, max_level)
data['B'], data['C'] = rC( data['A'], data['B'], data['C'] )
print( data )
If you want this code to work on other formulas notice that the value from the preceding row is addressed with [level-1] and values from the just calculated row with [level].