(Pandas) Is there a faster way to find the indexes of all values in a row which are equal to the max-CodePudding

I have a pandas dataframe Cellvoltage with multiple columns of float values. I'm trying to get the indices of all the values that equal the max value for every row.

So to achieve that, I am using the following code:

req_indices = np.argwhere(Cellvoltage.values == np.amax(Cellvoltage.values, axis=1).reshape(-1,1))
max_voltage_idx = [0]*len(req_indices)
for x,y in req_indices:
    if max_voltage_idx[x] == 0:
        max_voltage_idx[x] = [y]
    else:
        max_voltage_idx[x].append(y)
Cellvoltage['max_voltage_idx']  = pd.Series(max_voltage_idx).apply(np.array)

Is there a better/faster way to achive the same goal?

CodePudding user response：

Use df.eq(df.max(axis=1), axis=0) with where to mask the non-max values, then stack to remove the NaNs and get the index pairs.

out = df.where(df.eq(df.max(axis=1), axis=0)).stack().index.tolist()

Example input:


   A  B  C  D
0  1  2  3  3
1  4  1  1  4

Output:

[(0, 'C'), (0, 'D'), (1, 'A'), (1, 'D')]

If you want only the columns in a list as new column add a groupby.agg step:

df['max_voltage_idx']= (df                  
   .where(df.eq(df.max(axis=1), axis=0)).stack()
   .reset_index(-1).iloc[:, 0]
   .groupby(level=0).agg(list)
 )

Output:

   A  B  C  D max_voltage_idx
0  1  2  3  3          [C, D]
1  4  1  1  4          [A, D]

CodePudding user response：

Another possible solution:

indices = np.where(df.values == np.amax(df.values, axis=1)[:, None])
list(zip(indices[0], df.columns.values[indices[1]]))

Output:

[(0, 'C'), (0, 'D'), (1, 'A'), (1, 'D')]

Input:

The same @mozway (to whom I thank) uses:

   A  B  C  D
0  1  2  3  3
1  4  1  1  4