Home > Mobile >  Column as a sum of its cumulative value only if other column meets condition
Column as a sum of its cumulative value only if other column meets condition

Time:10-11

I am struggling to produce below df['res'] without loop / within pandas.

import pandas as pd
import numpy as np

df=pd.DataFrame()

df['a'] = [0,0,0,5,0,12,0,0,30]
df['b'] = [-2,0,0,-5,0,0,-2,-2,-100]
df['res'] = [0,0,0,3,0,7,0,0,26]
df['temp'] = [-2,-2,-2,-5,-5,0,-2,-4,-100]


df
Out[95]: 
    a    b  res  temp
0   0   -2    0    -2  # res = 0 since a = 0, temp = -2
1   0    0    0    -2  # res = 0 since a = 0, temp = -2
2   0    0    0    -2  # res = 0 since a = 0, temp = -2
3   5   -5    3    -5  # res = 3 since a = 5 and temp -2, reset temp, temp = -5
4   0    0    0    -5  # res = 0 since a = 0, temp = -5
5  12    0    7     0  # res = 7 since a = 12 and temp = -5, reset temp, temp = 0
6   0   -2    0    -2  # res = 0 since a = 0, temp = -2
7   0   -2    0    -4  # res = 0 since a = 0, temp = -4
8  30 -100   26  -100  # res = 26 since a = 30 and temp = -4, reset temp, temp = -100

Loop implementation of df['res']

result = [0]
temp = 0
for i in range(len(df)):
    if df['a'].iloc[i] != 0:
        result.append(temp   df['a'].iloc[i] - result[-1])
        temp = df['b'].iloc[i]
    else:
        result.append(0)
        temp  = df['b'].iloc[i]

df['result'] = result[1:]

In pandas, it could be something like

df['res'] = np.where(df['a'] == 0, 0,
            df['a'].cumsum()   df['b'].cumsum() - <sum of previous values in df['res']>)

The issue is that df['res'] is previously empty. Any hint how to think about these decompositions?

CodePudding user response:

As per your requirement, the value for temp will be reset as soon as we reach a 0 in a. So, I decided to first group your data set and then apply the rules. In a way that we group all rows in the same group up until there is a non-zero value for column a. Also all rows from one non-zero value for a till the next non-zero value for a. In this way we can make use of cumulative values of b for computing temp.:

import numpy as np
import pandas as pd

df['id'] = (~ df['a'].eq(0)).cumsum()
df['temp2'] = df.groupby('id')['b'].cumsum()
df['res2'] = np.where(df['a'].eq(0), 0, df['a']   df['temp2'].shift(fill_value=0))
df.drop(columns=['id'], inplace=True)


    a    b  res  temp  temp2  res2
0   0   -2    0    -2     -2     0
1   0    0    0    -2     -2     0
2   0    0    0    -2     -2     0
3   5   -5    3    -5     -5     3
4   0    0    0    -5     -5     0
5  12    0    7     0      0     7
6   0   -2    0    -2     -2     0
  • Related