Home > Mobile >  Pandas: Find maximum value in row and retrieve it's column position
Pandas: Find maximum value in row and retrieve it's column position

Time:11-03

Morning!

I want to find the maximum value for each row of my dataset.

Example:

ID Age Col_A Col_B Col_C
1 20 1 5 3
2 NaN 6 8 9
3 25 5 6 7
4 30 3 4 6
5 NaN 6 2 1
6 27 1 8 4

Then, for ID #2 knowing that the max value from the subset [Col_A; Col_B; Col_C] is 9 from Col_C I want to input the missing value as the mean Age of the ID's which also had the max value in Col_C.

In this case, the Age of ID #2 will be the average from Age ID#3 and ID#4 which will be 28.

CodePudding user response:

You can create a temporary column which shows which is the max column for each ID using idxmax and perform it column-wise, (axis=1), using only the Col_ columns.

Then impute the missing Age with a grouped average on the new column, using fillna and groupby.transform:

df['max_col'] = df.filter(like='Col_').idxmax(axis=1)
df['Age_filled'] = round(df['Age'].fillna(df.groupby('max_col')['Age'].transform('mean')))

Prints:

  ID   Age  Col_A  Col_B  Col_C max_col
0   1  20.0      1      5      3   Col_B
1   2  28.0      6      8      9   Col_C
2   3  25.0      5      6      7   Col_C
3   4  30.0      3      4      6   Col_C
4   5   NaN      6      2      1   Col_A
5   6  27.0      1      8      4   Col_B

For ID = 5, there is no other ID which has the maximum value in Col_A. So for such occasions, it is still left np.nan

  • Related