I apply idxmin()
to get the index of the minimum absolute value of two columns:
rep["OffsetFrom"] = rep[["OffsetDates", "OffsetDays"]].abs().idxmin(axis=1).dropna()
OffsetDates OffsetDays OffsetFrom
0 0.0 0.0 OffsetDates
1 1.0 1.0 OffsetDates
2 4.0 -3.0 OffsetDays
3 4.0 -3.0 OffsetDays
4 6.0 -1.0 OffsetDays
... ... ... ...
1165 0.0 0.0 OffsetDates
1166 0.0 0.0 OffsetDates
1167 0.0 0.0 OffsetDates
1168 0.0 0.0 OffsetDates
1169 0.0 0.0 OffsetDates
1170 rows × 3 columns
How do I get a column of the actual values after that?
PS. This one wouldn't work, since I need to keep the original signs:
rep[["OffsetFrom", "Offset"]] = rep[["OffsetDates", "OffsetDays"]].abs().agg(['idxmin','min'], axis=1)
Edit:
Eventually, I've simply done it with apply
:
def get_abs_min_keep_sign(x):
return min(x.min(), x.max(), key=abs)
rep["Offset"] = rep[["OffsetDates", "OffsetDays"]].apply(get_abs_min_keep_sign, axis=1)
However, there must be a more elegant approach out there.
CodePudding user response:
First idea is use lookup by column Offset
:
rep["Offset"] = rep[["OffsetDates", "OffsetDays"]].abs().idxmin(axis=1)
idx, cols = pd.factorize(rep['Offset'])
rep["OffsetVal"] = rep.reindex(cols, axis=1).to_numpy()[np.arange(len(rep)), idx]
Another numpy solution is use numpy.argmin
for positions by absolute values, then is possible use indexing by array by columns names for Offset
and for values is used indexing by rep[["OffsetDates", "OffsetDays"]]
converted to 2d array:
cols = ["OffsetDates", "OffsetDays"]
pos = np.argmin(rep[cols].abs().to_numpy(), axis=1)
rep["Offset2"] = np.array(cols)[pos]
rep["OffsetVal2"] = rep[cols].to_numpy()[np.arange(len(rep.index)),pos]
print (rep)
OffsetDates OffsetDays Offset OffsetVal Offset2 OffsetVal2
0 0.0 0.0 OffsetDates 0.0 OffsetDates 0.0
1 1.0 1.0 OffsetDates 1.0 OffsetDates 1.0
2 4.0 -3.0 OffsetDays -3.0 OffsetDays -3.0
3 4.0 -3.0 OffsetDays -3.0 OffsetDays -3.0
4 6.0 -1.0 OffsetDays -1.0 OffsetDays -1.0
Your function should be simplify:
rep["OffsetVal"] = rep[["OffsetDates","OffsetDays"]].apply(lambda x: min(x,key=abs),axis=1)
CodePudding user response:
You need check adding to_numpy
to remove the column impact and with lambda
df[["OffsetFrom", "Offset"]] = df[["OffsetDates", "OffsetDays"]].agg([lambda x : x.abs().idxmin(),'min'], axis=1).to_numpy()
df
Out[889]:
OffsetDates OffsetDays OffsetFrom Offset
0 0.0 0.0 OffsetDates 0.0
1 1.0 1.0 OffsetDates 1.0
2 4.0 -3.0 OffsetDays -3.0
3 4.0 -3.0 OffsetDays -3.0
4 6.0 -1.0 OffsetDays -1.0
Panda dataframe assign will match the index
and column
, when the column different, will return NaN