I have a lab with 2 dataframes as below:
df1 = df1.head(10)
HT D
0 A BL
1 B 0
2 C NaN
3 D 2
4 E 3
5 F NaN
6 G 7
rng.head(10)
Range Group
0 0 group A
1 3 Group B
2 5 Group C
3 8 Group D
4 11 Group E
how I can find approximate value from column df['D'] in column rng['Range'] , sth like vlookup approximate in Excel. My output expected as below:
HT D Group
0 A BL
1 B 0 GroupA
2 C NaN
3 D 2 GroupA
4 E 3 GroupB
5 F NaN
6 G 7 Group D
CodePudding user response:
You can use merge_asof
to create an intermediate, then join on the index. Your particular case is made more complex as you have mixed types in 'D', so you first need to subselect the numbers.
s = pd.to_numeric(df1['D'], errors='coerce')
df2 = pd.merge_asof(s[s.notna()].astype(int).reset_index(), rng,
left_on='D', right_on='Range')
out = df1.join(df2.drop(columns='D').set_index('index'))
output:
HT D Range Group
0 A BL NaN NaN
1 B 0 0.0 group A
2 C NaN NaN NaN
3 D 2 0.0 group A
4 E 3 3.0 Group B
5 F NaN NaN NaN
6 G 7 5.0 Group C
CodePudding user response:
You could define a function that returns the closest group for a given value of d
and then use that to assign a group to every row in df1
def find_group(d):
if np.isnan(d):
return ''
return rng.iloc[(rng['Range']-d).abs().argmin()]['Group']
df2 = df1.copy()
df2['Group'] = pd.to_numeric(df2['D'], errors='coerce').apply(lambda x: find_group(x))
This produces the output:
>>> print(df2)
HT D Group
0 A BL
1 B 0 groupA
2 C NaN
3 D 2 GroupB
4 E 3 GroupB
5 F NaN
6 G 7 GroupD
EDIT: I believe the Excel lookup doesn't match the closest row but rather the largest value that is less than the lookup value. You can implement that logic in the find_group
function as well:
def find_group(d):
if np.isnan(d):
return ''
return rng[(d >= rng['Range'])].iloc[-1]['Group']