Home > database >  Merge two dataframes on nearest value while duplicating rows
Merge two dataframes on nearest value while duplicating rows

Time:11-26

I have two dataframes,

DF1 = NUM1     Car         COLOR 
      100     Honda       blue
      100     Honda       yellow
      200     Volvo       red

DF2 = NUM2     Car         STATE
      110     Honda        good
      110     Honda        bad
      230     Volvo        not bad
      230     Volvo        excellent

I want to merge them on nearest value in columns NUM1 & NUM2 in order to get this desired dataframe:

DF3 = NUM     CAR      COLOR     STATE
      100     HONDA     blue      good
      100     HONDA     blue      bad
      100     HONDA     yellow    good
      100     HONDA     yellow    bad
      200     VOLVO     red       not bad
      200     VOLVO     red       excellent

I've tried this:

df3 = pd.merge_asof(df1, df2, left_on="NUM1", right_on="NUM2")

But this is the result I get:

DF3 =     NUM     CAR      COLOR     STATE
          100     HONDA     blue      good
          100     HONDA     yellow    good
          200     VOLVO     red       not bad

CodePudding user response:

IIUC, you might need to combine merge_asof and merge:

key = pd.merge_asof(DF1.reset_index().sort_values(by='NUM1'),
                    DF2['NUM2'],
                    left_on='NUM1', right_on='NUM2',
                    direction='nearest')['NUM2']

DF1.merge(DF2.drop(columns=DF1.columns.intersection(DF2.columns)),
          left_on=key, right_on='NUM2')
  • Related