I have one data frame, df1 with values as given below.
df1
Var1
105.1290
52.7885
10.9922
22.8443
73.5880
96.2842
91.9474
41.6485
68.4402
89.7366
df2
Var2
84.95632934
97.58280346
Now, I replace the values in df1 with the values in df2 whose absolute difference is low like the ones marked below. diff1 is obtained by subtracting values in df1 from first value in df2 (i.e. 84.95632934). Similarly, diff2 is obtained by subtracting values in df1 from second value in df2 (i.e. 97.58280346). diff1 column shows that the difference is least for 89.7366 value in df1 so it is replaced with 84.95632934. Similarly diff2 shows the absolute difference is least for 96.2842 value in df1 so it is replaced with 97.58280346.
Var1 diff1 diff2
105.1290 20.1727 7.5462
52.7885 -32.1678 -44.7943
10.9922 -73.9641 -86.5906
22.8443 -62.1120 -74.7385
73.5880 -11.3683 -23.9948
96.2842* 11.3279 -1.2986
91.9474 6.9911 -5.6354
41.6485 -43.3078 -55.9343
68.4402 -16.5161 -29.1426
89.7366* 4.7803 -7.8462
The final dataframe as a result should be as shown in table df3. How can I obtain df3?
df3
Var1
105.1290
52.7885
10.9922
22.8443
73.5880
97.5828
91.9474
41.6485
68.4402
84.9563
CodePudding user response:
Use array broadcasting to compute the differences and set the values with .loc
and idxmin
:
other = df2["Var2"].to_numpy()
differences = pd.DataFrame(df1['Var1'].to_numpy()[:, None] - other).abs()
df1["Var1"].loc[differences.idxmin()] = other
>>> df1
Var1
0 105.129000
1 52.788500
2 10.992200
3 22.844300
4 73.588000
5 97.582803
6 91.947400
7 41.648500
8 68.440200
9 84.956329