Home > Blockchain >  Pandas: sum DataFrame column for max value only
Pandas: sum DataFrame column for max value only

Time:03-29

I have the following DataFrame:

df = pd.DataFrame({'a': [0.28, 0, 0.25, 0.85, 0.1],
                   'b': [0.5, 0.5, 0, 0.75, 0.1],
                   'c': [0.33, 0.7, 0.25, 0.2, 0.5],
                   'd': [0, 0.25, 0.2, 0.66, 0.1]})

Output:

      a     b     c     d
0  0.28  0.50  0.33  0.00
1  0.00  0.50  0.70  0.25
2  0.25  0.00  0.25  0.20
3  0.85  0.75  0.20  0.66
4  0.10  0.10  0.50  0.10

For each column, I want to sum the top n max values of the column, where n is determined by how many row max values that column contains.

For example, column b has a row max only in row 1, so its sum is the sum of the top 1 max values in that column, which is just 0.5 -- but column c has three row-maxes, located in rows 1, 2, and 4, so the top 3 max values of column c should be summed.

Expected output:

          a     b     c     d
0      0.28  0.50  0.33  0.00
1      0.00  0.50  0.70  0.25
2      0.25  0.00  0.25  0.20
3      0.85  0.75  0.20  0.66
4      0.10  0.10  0.50  0.10
count  1.10  0.50  1.45  0.00

CodePudding user response:

where

df.append(
    df.where(  # only look at values that are max for the row
        df.eq(  # compare max values to all values in row just in case there are more than 1
            df.max(axis=1),  # actually get max values
            axis=0
        )
    ).sum().rename('count')
)

          a     b     c     d
0      0.28  0.50  0.33  0.00
1      0.00  0.60  0.50  0.25
2      0.25  0.00  1.00  0.20
3      0.85  0.75  0.20  0.66
4      0.10  0.10  0.50  0.10
count  0.85  1.10  1.50  0.00

CodePudding user response:

The fastest way to do this would be to using the .max() method passing through the axis argument:

df.max(axis =1)

If you're after another column:

df['column_name'] = df.max(axis =1)

I didn't read the question that well!

  • Related