how to lookup approximate in DataFrame Pandas?-CodePudding

I have a lab with 2 dataframes as below:

df1 = df1.head(10)
        HT  D
    0   A   BL
    1   B   0
    2   C   NaN
    3   D   2
    4   E   3
    5   F   NaN
    6   G   7


rng.head(10)
    Range   Group
0   0       group A
1   3       Group B
2   5       Group C
3   8       Group D
4   11      Group E

how I can find approximate value from column df['D'] in column rng['Range'] , sth like vlookup approximate in Excel. My output expected as below:

    HT  D    Group
0   A   BL   
1   B   0    GroupA
2   C   NaN
3   D   2    GroupA
4   E   3    GroupB
5   F   NaN  
6   G   7    Group D

CodePudding user response：

You can use merge_asof to create an intermediate, then join on the index. Your particular case is made more complex as you have mixed types in 'D', so you first need to subselect the numbers.

s = pd.to_numeric(df1['D'], errors='coerce')
df2 = pd.merge_asof(s[s.notna()].astype(int).reset_index(), rng,
                    left_on='D', right_on='Range')

out = df1.join(df2.drop(columns='D').set_index('index'))

output:

  HT    D  Range    Group
0  A   BL    NaN      NaN
1  B    0    0.0  group A
2  C  NaN    NaN      NaN
3  D    2    0.0  group A
4  E    3    3.0  Group B
5  F  NaN    NaN      NaN
6  G    7    5.0  Group C

CodePudding user response：

You could define a function that returns the closest group for a given value of d and then use that to assign a group to every row in df1

def find_group(d):
    if np.isnan(d):
        return ''
    return rng.iloc[(rng['Range']-d).abs().argmin()]['Group']

df2 = df1.copy()
df2['Group'] = pd.to_numeric(df2['D'], errors='coerce').apply(lambda x: find_group(x))

This produces the output:

>>> print(df2)
  HT    D   Group
0  A   BL        
1  B    0  groupA
2  C  NaN        
3  D    2  GroupB
4  E    3  GroupB
5  F  NaN        
6  G    7  GroupD

EDIT: I believe the Excel lookup doesn't match the closest row but rather the largest value that is less than the lookup value. You can implement that logic in the find_group function as well:

def find_group(d):
    if np.isnan(d):
        return ''
    return rng[(d >= rng['Range'])].iloc[-1]['Group']