I have a csv file with some calculations that looks something like this:
Value1 |
---|
-1 |
-4 |
-5 |
-2 |
-3 |
-6 |
1 |
7 |
5 |
8 |
2 |
-1 |
2 |
-3 |
I would like to add a new column to it with a cumulative calculation that takes into account the sign in the Value1 column so that something like this would turn out:
Value1 | Value2 |
---|---|
-1 | -1 |
-4 | -5 |
-5 | -10 |
-2 | -12 |
-3 | -15 |
-6 | -21 |
1 | 1 |
7 | 8 |
5 | 13 |
8 | 21 |
2 | 23 |
-1 | -1 |
2 | 2 |
-3 | -3 |
That is, for example, while there is a negative value in the Value 1 column, there is an addition in the Value2 column (x (-x1)) where x is the value Value1 and -x1 is the previous value in the column Value2 and when the sign in the column Value 1 is changed, the calculation begins anew
Is this possible with Python and Pandas?
CodePudding user response:
First identify where the rows change sign with a mask. Then groupby them and use cumsum
:
pos_or_neg = df['Value1'] >= 0
groups = pos_or_neg.diff().ne(0).cumsum()
print(groups)
The groups look like this:
0 1
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 3
12 4
13 5
Name: Value2, dtype: int32
Finally:
df['Value2'] = df.groupby(groups)['Value1'].cumsum()
print(df)
Output:
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
CodePudding user response:
You could try as follows:
import pandas as pd
# read your csv
fname = 'mypath/myfile.csv'
df = pd.read_csv(fname)
# pd.Series with False for neg vals / True for pos vals
positives = df.Value1 >= 0
# pd.Series with consecutive count groups
sequences = (positives != positives.shift()).cumsum()
# now perform `groupby` on these groups, and assign to df.Value2
df['Value2'] = df.groupby(sequences)['Value1'].cumsum()
print(df)
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
# save to csv again
df.to_csv(fname, index=False)
CodePudding user response:
Another possible solution is to use run-length encoding
via package python-rle
:
import rle
r = rle.encode(np.sign(df['Value1']))
df['Value2'] = df.groupby(rle.decode([*range(1, len(r[0]) 1)], r[1])).cumsum()
Output:
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3