Home > Enterprise >  Pandas: Round integers before joining dataframes
Pandas: Round integers before joining dataframes

Time:03-09

I have two data frames that both contain coordinates. One of them, df1, has coordinates at a better resolution (with decimals), and I would like to join it to df2 which has a less-good resolution:

import pandas as pd

df1 = pd.DataFrame({'x': [1.1, 2.2, 3.3],
                    'y': [2.3, 3.3, 4.1],
                    'val': [10,11,12]})

df2 = pd.DataFrame({'x': [1,2,3,5.5],
                    'y': [2,3,4,5.6]})

df1['x_org']=df1['x']
df1['y_org']=df1['y']
df1[['x','y']] = df1[['x','y']].round()
df1 = pd.merge(df1, df2, how='left', on=['x','y'])

df1.drop({'x','y'}, axis=1)
# rename...

The code above does exactly what I want, but it is a bit cumbersome. Is there an easier way to achieve this?

CodePudding user response:

Use:

df1.merge(df2, 
          how='left', 
          left_on=[df1['x'].round(), df1['y'].round()], 
          right_on=['x','y'], 
          suffixes=('','_')).drop(['x_','y_'], axis=1)

Also is possible remove columns ending by _ dynamic:

df = df1.merge(df2, 
               how='left', 
               left_on=[df1['x'].round(), df1['y'].round()], 
               right_on=['x','y'], 
               suffixes=('','_')).filter(regex='.*[^_]$')

print (df)
     x    y  val
0  1.1  2.3   10
1  2.2  3.3   11
2  3.3  4.1   12

df = df1.merge(df2, 
               how='left', 
               left_on=[df1['x'].round(), df1['y'].round()], 
               right_on=['x','y'], 
               suffixes=('','_end')).filter(regex='.*(?<!_end)$')

print (df)
     x    y  val
0  1.1  2.3   10
1  2.2  3.3   11
2  3.3  4.1   12

Or:

df = (df1.set_index(['x','y'], drop=False).rename(lambda x: round(x))
         .merge(df2.set_index(['x','y']), 
                left_index=True,
                right_index=True, 
                how='left').reset_index(drop=True))
print (df)
     x    y  val
0  1.1  2.3   10
1  2.2  3.3   11
2  3.3  4.1   12

CodePudding user response:

IIUC, you could pass the rounded values as joining keys:

pd.merge(df1.rename(columns={'x': 'x_org', 'y': 'y_org'}),
         df2,
         how='left',
         left_on=[df1['x'].round(), df1['x'].round()],
         right_on=['x', 'y'])#.drop({'x','y'}, axis=1) # if x/y are unwanted

output:

   x_org  y_org  val    x    y
0    1.1    2.3   10  1.0  1.0
1    2.2    3.3   11  2.0  2.0
2    3.3    4.1   12  3.0  3.0
  • Related