I have dataframe that looks like the following:
Given Y1 eY1 Y2 eY2 Y3 eY3 Y4 eY4 Y5
0 0.45 0.25 0.3550 0.39 0.4200 0.43 0.5950 0.65 0.7175 0.74
1 0.39 0.15 0.2400 0.27 0.5025 0.58 0.7675 0.83 0.8600 0.87
2 0.99 0.30 0.4875 0.55 0.7225 0.78 0.9075 0.95 0.9800 0.99
3 0.58 0.23 0.2825 0.30 0.5550 0.64 0.7075 0.73 0.8725 0.92
4 NaN 0.25 0.3625 0.40 0.6175 0.69 0.8100 0.85 0.9250 0.95
My goal is simple: try to match the "given" value in each row to the closest column index (columns are sorted in ascending order and output the closest column index to a new column. I have been stuck on this for some time and would greatly appreciate any help/starting tips.
(for any "Nan" values in Given, I am outputting "none")
Thank you!
CodePudding user response:
First subtract all columns without Given
by column Given
by DataFrame.sub
with absolute values and then use DataFrame.idxmin
if not missing values in Given
:
df1 = df.drop('Given', 1).sub(df['Given'], axis=0).abs()
print (df1)
Y1 eY1 Y2 eY2 Y3 eY3 Y4 eY4 Y5
0 0.20 0.0950 0.06 0.0300 0.02 0.1450 0.20 0.2675 0.29
1 0.24 0.1500 0.12 0.1125 0.19 0.3775 0.44 0.4700 0.48
2 0.69 0.5025 0.44 0.2675 0.21 0.0825 0.04 0.0100 0.00
3 0.35 0.2975 0.28 0.0250 0.06 0.1275 0.15 0.2925 0.34
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
df['new'] = np.where(df['Given'].isna(), None, df1.idxmin(axis=1))
print (df)
Given Y1 eY1 Y2 eY2 Y3 eY3 Y4 eY4 Y5 new
0 0.45 0.25 0.3550 0.39 0.4200 0.43 0.5950 0.65 0.7175 0.74 Y3
1 0.39 0.15 0.2400 0.27 0.5025 0.58 0.7675 0.83 0.8600 0.87 eY2
2 0.99 0.30 0.4875 0.55 0.7225 0.78 0.9075 0.95 0.9800 0.99 Y5
3 0.58 0.23 0.2825 0.30 0.5550 0.64 0.7075 0.73 0.8725 0.92 eY2
4 NaN 0.25 0.3625 0.40 0.6175 0.69 0.8100 0.85 0.9250 0.95 None