I have a pandas dataframe with this structure:
df_test
V | A | B | C | D
-10 | nan | nan | nan | nan
-9.9 | 10 | 1 | -2200 | 100
-9.8 | 11 | 2 | -2211 | 1
I would like to add a new column minimum_difference
containing the column name with the smallest absolute value for that specific row (ignoring V
), like this:
V | A | B | C | D | minimum_difference
-10 | nan | nan | nan | nan | nan
-9.9 | 10 | 1 | -2200 | 100 | B
-9.8 | 11 | 2 | -2211 | 1 | D
CodePudding user response:
You can use df.drop
to skip the V
column, and df.abs
to convert the dataframe's values to their absolute equivalents, and use df.idxmin
with axis=1
to find the index (which will be a column name) of the smallest (absolute) value:
df_test['minimum_difference'] = df_test.drop('V', axis=1).abs().idxmin(axis=1)
Output:
>>> df
V A B C D minimum_difference
0 -10.0 NaN NaN NaN NaN NaN
1 -9.9 10.0 1.0 -2200.0 100.0 B
2 -9.8 11.0 2.0 -2211.0 1.0 D
CodePudding user response:
You can check if a row
contains NaN
, then put NaN
, otherwise calculate the min
of the row and put the index.
In [101]: import numpy as np
In [102]: df['minimum_difference'] = np.where(df.isna().any(1), np.nan, df.abs().idxmin(1))
In [103]: df
Out[103]:
V A B C D minimum_difference
0 -10.0 NaN NaN NaN NaN NaN
1 -9.9 10.0 1.0 -2200.0 100.0 B
2 -9.8 11.0 2.0 -2211.0 1.0 D
CodePudding user response:
Here I am sharing Mayank's snippet with little update.
In [84]: df['minimum_difference'] = df.ix[:, 1:].abs().idxmin(1)
In [85]: df
Out[85]:
V A B C D minimum_difference
0 -10.0 NaN NaN NaN NaN NaN
1 -9.9 10.0 1.0 -2200.0 100.0 B
2 -9.8 11.0 2.0 -2211.0 200.0 B