Supposing I have a data frame that looks like:
col1 col2
0 10
1 23
2 21
3 15
I want to subtract each value in col2 with the previous row sequentially, so that we are subtracting the previously subtracted value to get:
col1 col2
0 10 # left unchanged as index == 0
1 13 # 23 - 10
2 8 # 21 - 13
3 7 # 15 - 8
Other solutions that I have found all subtract the previous values as is, and not the new subtracted value. I would like to avoid using for loops as I have a very large dataset.
CodePudding user response:
Try below to understand the 'previously subtracted'
b2 = a2 - a1
b3 = a3 - b2 = a3 - a2 a1
b4 = a4 - b3 = a4 - a3 a2 - a1
b5 = a5 - b4 = a5 - a4 a3 - a2 a1
So we just do
s = np.arange(len(df))%2
s = s s - 1
df['new'] = np.tril(np.multiply.outer(s,s)).dot(df.col2)
Out[47]: array([10, 13, 8, 7])
CodePudding user response:
Below a simple pure Pandas (doesn't need to import numpy) approach which is a more straightforward concept and easy to understand from code without additional explanations:
Let's first define a function which will do the required work:
def ssf(val):
global last_val
last_val = val - last_val
return last_val
Using the function above the code for creating the new column will be:
last_val = 0
df['new'] = df.col2.apply(ssf)
Let's compare number of functions/methods used by the pure Pandas approach compared to the numpy one in the other answer.
The Pandas approach uses 2 functions/methods: ssf() and .apply() and 1 operation: simple subtraction.
The numpy approach uses 5 functions/methods: .arange(), len(), .tril(), .multiply.outer() and .dot() and 3 operations: array addition, array subtraction and modulo division.