I want to sum all values greater than zero in column 2 of the dataframe. This sum should next be assigned to all the rows that were summed in a new column (3). See the example below:
Dataframe:
Column 1 Column 2
0 15 0.2
1 11 0
2 13 0.4
3 19 0.4
4 14 0.4
5 2 0
6 1 0
7 15 0.4
8 3 0
Required result:
Column 1 Column 2 Column 3
0 15 0.2 0.2
1 11 0 0
2 13 0.4 1.2
3 19 0.4 1.2
4 14 0.4 1.2
5 2 0 0
6 1 0 0
7 15 0.4 0.4
8 3 0 0
Next another column (4) should be added which contains the lowest value for each of these blocks out of column 1. The rows with a zero in column 2 can just pas on the value of column 1:
Column 1 Column 2 Column 3
0 15 0.2 0.2
1 11 0 0
2 13 0.4 1.2
3 19 0.4 1.2
4 14 0.4 1.2
5 2 0 0
6 1 0 0
7 15 0.4 0.4
8 3 0 0
Required result:
Column 1 Column 2 Column 3 Column 4
0 15 0.2 0.2 15
1 11 0 0 11
2 13 0.4 1.2 13
3 19 0.4 1.2 13
4 14 0.4 1.2 13
5 2 0 0 2
6 1 0 0 1
7 15 0.4 0.4 15
8 3 0 0 3
Who can give me the most elegant solution? Thanking you all in advance!
CodePudding user response:
Let's try
m = df['Column 2'].eq(0)
df['Column 3'] = (df['Column 2'].mask(m, 0) # convert value with True to 0
.groupby(m.cumsum()).transform('sum') # sum value in each group
.mask(m, 0)) # convert value with True to 0
df['Column 4'] = (df['Column 1'].mask(m, float('inf'))
.groupby(m.cumsum()).transform('min')
.mask(m, df['Column 1']))
print(df)
Column 1 Column 2 Column 3 Column 4
0 15 0.2 0.2 15.0
1 11 0.0 0.0 11.0
2 13 0.4 1.2 13.0
3 19 0.4 1.2 13.0
4 14 0.4 1.2 13.0
5 2 0.0 0.0 2.0
6 1 0.0 0.0 1.0
7 15 0.4 0.4 15.0
8 3 0.0 0.0 3.0