Home > database >  Get values after idxmin(1)
Get values after idxmin(1)

Time:06-24

I apply idxmin() to get the index of the minimum absolute value of two columns:

rep["OffsetFrom"] = rep[["OffsetDates", "OffsetDays"]].abs().idxmin(axis=1).dropna()
    OffsetDates OffsetDays  OffsetFrom
0       0.0     0.0         OffsetDates
1       1.0     1.0         OffsetDates
2       4.0     -3.0        OffsetDays
3       4.0     -3.0        OffsetDays
4       6.0     -1.0        OffsetDays
... ... ... ...
1165    0.0     0.0         OffsetDates
1166    0.0     0.0         OffsetDates
1167    0.0     0.0         OffsetDates
1168    0.0     0.0         OffsetDates
1169    0.0     0.0         OffsetDates
1170 rows × 3 columns

How do I get a column of the actual values after that?
PS. This one wouldn't work, since I need to keep the original signs:

rep[["OffsetFrom", "Offset"]] = rep[["OffsetDates", "OffsetDays"]].abs().agg(['idxmin','min'], axis=1)

Edit: Eventually, I've simply done it with apply:

def get_abs_min_keep_sign(x):
    return min(x.min(), x.max(), key=abs)

rep["Offset"] = rep[["OffsetDates", "OffsetDays"]].apply(get_abs_min_keep_sign, axis=1)

However, there must be a more elegant approach out there.

CodePudding user response:

First idea is use lookup by column Offset:

rep["Offset"] = rep[["OffsetDates", "OffsetDays"]].abs().idxmin(axis=1)
    
idx, cols = pd.factorize(rep['Offset'])
rep["OffsetVal"] = rep.reindex(cols, axis=1).to_numpy()[np.arange(len(rep)), idx]
     

Another numpy solution is use numpy.argmin for positions by absolute values, then is possible use indexing by array by columns names for Offset and for values is used indexing by rep[["OffsetDates", "OffsetDays"]] converted to 2d array:

cols = ["OffsetDates", "OffsetDays"]

pos = np.argmin(rep[cols].abs().to_numpy(), axis=1)
rep["Offset2"] = np.array(cols)[pos]
rep["OffsetVal2"] = rep[cols].to_numpy()[np.arange(len(rep.index)),pos]
print (rep)
   OffsetDates  OffsetDays       Offset  OffsetVal      Offset2  OffsetVal2
0          0.0         0.0  OffsetDates        0.0  OffsetDates         0.0
1          1.0         1.0  OffsetDates        1.0  OffsetDates         1.0
2          4.0        -3.0   OffsetDays       -3.0   OffsetDays        -3.0
3          4.0        -3.0   OffsetDays       -3.0   OffsetDays        -3.0
4          6.0        -1.0   OffsetDays       -1.0   OffsetDays        -1.0
    

Your function should be simplify:

rep["OffsetVal"] = rep[["OffsetDates","OffsetDays"]].apply(lambda x: min(x,key=abs),axis=1)

CodePudding user response:

You need check adding to_numpy to remove the column impact and with lambda

df[["OffsetFrom", "Offset"]] = df[["OffsetDates", "OffsetDays"]].agg([lambda x : x.abs().idxmin(),'min'], axis=1).to_numpy()
df
Out[889]: 
   OffsetDates  OffsetDays   OffsetFrom  Offset
0          0.0         0.0  OffsetDates     0.0
1          1.0         1.0  OffsetDates     1.0
2          4.0        -3.0   OffsetDays    -3.0
3          4.0        -3.0   OffsetDays    -3.0
4          6.0        -1.0   OffsetDays    -1.0

Panda dataframe assign will match the index and column , when the column different, will return NaN

  • Related