Home > Enterprise >  Pandas create column that shows absolute max value of the row, but keeps negatives
Pandas create column that shows absolute max value of the row, but keeps negatives

Time:04-22

I have a df that looks like this:

         a      b      c
124    -3.09  -0.38   2.34
2359    4.81   0.51  -1.53
56555  -4.34  -0.64   2.31
96786  -3.33  -3.34  -7.62

I want to calculate the absolute max value of each row in a new column that keeps negatives as negatives. The closest I've gotten is with the following:

df['new_column'] = df.abs().max(axis = 1)

new_column
3.09
4.81
4.34
7.62

But I need the new column to keep the negative signs—i.e. to look like this:

new_column
-3.09
 4.81
-4.34
-7.62

I've attempted a few things using abs().idxmax(), and am wondering if I need to find the location of the absolute max value, and then return the value in that location in the new column—just not sure how to do this. Thoughts?

CodePudding user response:

Here's one way using two steps: First, find the absolute max. Then see if absolute max equals any values in the DataFrame using eq and use the output as the power of -1 to get the signs:

row_max = df.abs().max(axis=1)
df['new_column'] = row_max * (-1) ** df.ne(row_max, axis=0).all(axis=1)

Another option is to use mask to choose values:

df['columns'] = df.max(axis=1).mask(lambda x: x < row_max, -row_max)

Output:

          a     b     c  new_column
124   -3.09 -0.38  2.34       -3.09
2359   4.81  0.51 -1.53        4.81
56555 -4.34 -0.64  2.31       -4.34
96786 -3.33 -3.34 -7.62       -7.62

CodePudding user response:

I like the original idea you thought of, keeping with the theme:

# setup

data = {'a': [-3.09, 4.81, -4.34, -3.33],
        'b': [-.38, .51, -.64, -3.34],
        'c': [2.34, -1.53, 2.31, -7.62]}

df = pd.DataFrame(data, index= [124, 2359,56555,96786])

instead of:

df['new_column'] = df.abs().max(axis = 1)

let's change it to return the column instead of actual value:

max_col = df.abs().idxmax(axis = 1)

from there we can just iterate over it with enumerate for the row number and set it as the new column:

df['new_column'] = [df.loc[row,col] for row, col in zip(df.index, max_col)]

results:

          a     b     c  new_column
124   -3.09 -0.38  2.34       -3.09
2359   4.81  0.51 -1.53        4.81
56555 -4.34 -0.64  2.31       -4.34
96786 -3.33 -3.34 -7.62       -7.62
  • Related