Home > front end >  Dataframe summing all values of column bigger than 0 and add lowest value of other column to all of
Dataframe summing all values of column bigger than 0 and add lowest value of other column to all of

Time:10-26

I want to sum all values greater than zero in column 2 of the dataframe. This sum should next be assigned to all the rows that were summed in a new column (3). See the example below:

Dataframe:
     Column 1 Column 2
0       15        0.2
1       11        0
2       13        0.4
3       19        0.4
4       14        0.4
5       2         0 
6       1         0
7       15        0.4
8       3         0

Required result:
     Column 1 Column 2 Column 3
0       15        0.2    0.2
1       11        0      0
2       13        0.4    1.2
3       19        0.4    1.2
4       14        0.4    1.2
5       2         0      0 
6       1         0      0
7       15        0.4    0.4 
8       3         0      0

Next another column (4) should be added which contains the lowest value for each of these blocks out of column 1. The rows with a zero in column 2 can just pas on the value of column 1:

     Column 1 Column 2 Column 3
0       15        0.2    0.2
1       11        0      0
2       13        0.4    1.2
3       19        0.4    1.2
4       14        0.4    1.2
5       2         0      0 
6       1         0      0
7       15        0.4    0.4 
8       3         0      0

Required result:
     Column 1 Column 2 Column 3 Column 4
0       15        0.2    0.2       15
1       11        0      0         11
2       13        0.4    1.2       13 
3       19        0.4    1.2       13 
4       14        0.4    1.2       13 
5       2         0      0         2
6       1         0      0         1
7       15        0.4    0.4       15
8       3         0      0         3

Who can give me the most elegant solution? Thanking you all in advance!

CodePudding user response:

Let's try

m = df['Column 2'].eq(0)

df['Column 3'] = (df['Column 2'].mask(m, 0)              # convert value with True to 0
                  .groupby(m.cumsum()).transform('sum')  # sum value in each group
                  .mask(m, 0))                           # convert value with True to 0

df['Column 4'] = (df['Column 1'].mask(m, float('inf'))
                  .groupby(m.cumsum()).transform('min')
                  .mask(m, df['Column 1']))
print(df)

   Column 1  Column 2  Column 3  Column 4
0        15       0.2       0.2      15.0
1        11       0.0       0.0      11.0
2        13       0.4       1.2      13.0
3        19       0.4       1.2      13.0
4        14       0.4       1.2      13.0
5         2       0.0       0.0       2.0
6         1       0.0       0.0       1.0
7        15       0.4       0.4      15.0
8         3       0.0       0.0       3.0
  • Related