Python pandas how to sum values by accumulation while zeroing when changing the sign ( ,-)-CodePudding

I have a csv file with some calculations that looks something like this:

Value1
-1
-4
-5
-2
-3
-6
1
7
5
8
2
-1
2
-3

I would like to add a new column to it with a cumulative calculation that takes into account the sign in the Value1 column so that something like this would turn out:

Value1	Value2
-1	-1
-4	-5
-5	-10
-2	-12
-3	-15
-6	-21
1	1
7	8
5	13
8	21
2	23
-1	-1
2	2
-3	-3

That is, for example, while there is a negative value in the Value 1 column, there is an addition in the Value2 column (x (-x1)) where x is the value Value1 and -x1 is the previous value in the column Value2 and when the sign in the column Value 1 is changed, the calculation begins anew

Is this possible with Python and Pandas?

CodePudding user response：

First identify where the rows change sign with a mask. Then groupby them and use cumsum:

pos_or_neg = df['Value1'] >= 0
groups = pos_or_neg.diff().ne(0).cumsum()
print(groups)

The groups look like this:

0     1
1     1
2     1
3     1
4     1
5     1
6     2
7     2
8     2
9     2
10    2
11    3
12    4
13    5
Name: Value2, dtype: int32

Finally:

df['Value2'] = df.groupby(groups)['Value1'].cumsum()
print(df)

Output:

    Value1  Value2
0       -1      -1
1       -4      -5
2       -5     -10
3       -2     -12
4       -3     -15
5       -6     -21
6        1       1
7        7       8
8        5      13
9        8      21
10       2      23
11      -1      -1
12       2       2
13      -3      -3

CodePudding user response：

You could try as follows:

import pandas as pd

# read your csv
fname = 'mypath/myfile.csv'
df = pd.read_csv(fname)

# pd.Series with False for neg vals / True for pos vals
positives = df.Value1 >= 0

# pd.Series with consecutive count groups
sequences = (positives != positives.shift()).cumsum()

# now perform `groupby` on these groups, and assign to df.Value2
df['Value2'] = df.groupby(sequences)['Value1'].cumsum()

print(df)

    Value1  Value2
0       -1      -1
1       -4      -5
2       -5     -10
3       -2     -12
4       -3     -15
5       -6     -21
6        1       1
7        7       8
8        5      13
9        8      21
10       2      23
11      -1      -1
12       2       2
13      -3      -3

# save to csv again
df.to_csv(fname, index=False)

CodePudding user response：

Another possible solution is to use run-length encoding via package python-rle:

import rle

r = rle.encode(np.sign(df['Value1']))

df['Value2'] = df.groupby(rle.decode([*range(1, len(r[0]) 1)], r[1])).cumsum()

Output:

    Value1  Value2
0       -1      -1
1       -4      -5
2       -5     -10
3       -2     -12
4       -3     -15
5       -6     -21
6        1       1
7        7       8
8        5      13
9        8      21
10       2      23
11      -1      -1
12       2       2
13      -3      -3