I have two data frames that both contain coordinates. One of them, df1
, has coordinates at a better resolution (with decimals), and I would like to join it to df2
which has a less-good resolution:
import pandas as pd
df1 = pd.DataFrame({'x': [1.1, 2.2, 3.3],
'y': [2.3, 3.3, 4.1],
'val': [10,11,12]})
df2 = pd.DataFrame({'x': [1,2,3,5.5],
'y': [2,3,4,5.6]})
df1['x_org']=df1['x']
df1['y_org']=df1['y']
df1[['x','y']] = df1[['x','y']].round()
df1 = pd.merge(df1, df2, how='left', on=['x','y'])
df1.drop({'x','y'}, axis=1)
# rename...
The code above does exactly what I want, but it is a bit cumbersome. Is there an easier way to achieve this?
CodePudding user response:
Use:
df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).drop(['x_','y_'], axis=1)
Also is possible remove columns ending by _
dynamic:
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_')).filter(regex='.*[^_]$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
df = df1.merge(df2,
how='left',
left_on=[df1['x'].round(), df1['y'].round()],
right_on=['x','y'],
suffixes=('','_end')).filter(regex='.*(?<!_end)$')
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
Or:
df = (df1.set_index(['x','y'], drop=False).rename(lambda x: round(x))
.merge(df2.set_index(['x','y']),
left_index=True,
right_index=True,
how='left').reset_index(drop=True))
print (df)
x y val
0 1.1 2.3 10
1 2.2 3.3 11
2 3.3 4.1 12
CodePudding user response:
IIUC, you could pass the rounded values as joining keys:
pd.merge(df1.rename(columns={'x': 'x_org', 'y': 'y_org'}),
df2,
how='left',
left_on=[df1['x'].round(), df1['x'].round()],
right_on=['x', 'y'])#.drop({'x','y'}, axis=1) # if x/y are unwanted
output:
x_org y_org val x y
0 1.1 2.3 10 1.0 1.0
1 2.2 3.3 11 2.0 2.0
2 3.3 4.1 12 3.0 3.0