Matching geographic coordinates between two data frames-CodePudding

I have two data frames that have Longitude and Latitude columns. DF1 and DF2:

DF1 = pd.DataFrame([[19.827658,-20.372238,8614], [19.825407,-20.362608,7412], [19.081514,-17.134456,8121]], columns=['Longitude1', 'Latitude1','Echo_top_height'])
DF2 = pd.DataFrame([[19.083727, -17.151207, 285.319994], [19.169403, -17.154144, 284.349994], [19.081514,-17.154456, 285.349994]], columns=['Longitude2', 'Latitude2','BT'])

I need to find a match for long and lat in DF1 with a long and lat in DF2. And where data match, add the corresponding value from the BT column from DF2 to DF1.

I used the code from here and managed to check if there is a match:

from sklearn.metrics.pairwise import haversine_distances
threshold = 5000 # meters
earth_radius = 6371000  # meters
DF1['nearby'] = (
# get the distance between all points of each DF
haversine_distances(
    # note that you need to convert to radiant with *np.pi/180
    X=DF1[['Latitude1','Longitude1']].to_numpy()*np.pi/180, 
    Y=DF2[['Latitude2','Longitude2']].to_numpy()*np.pi/180)
*earth_radius < threshold).any(axis=1).astype(int)

So the result I need would look like this:

Longitude1 Latitude1 Echo_top_height   BT
19.82       -20.37       8614         290.345
19.82       -20.36       7412         289.235
and so on...

CodePudding user response：

It looks like you are comparing the dataframes by index, so you can use join and drop the unnecessary rows and columns:

DF3 = DF1.join(DF2[['BT']])
DF3 = DF3[DF3['nearby'].eq(1)].drop('nearby', axis=1)
DF3

full reproducible code:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import haversine_distances
DF1 = pd.DataFrame([[19.827658,-20.372238,8614], [19.825407,-20.362608,7412], [19.081514,-17.134456,8121]], columns=['Longitude1', 'Latitude1','Echo_top_height'])
DF2 = pd.DataFrame([[19.083727, -17.151207, 285.319994], [19.169403, -17.154144, 284.349994], [19.081514,-17.154456, 285.349994]], columns=['Longitude2', 'Latitude2','BT'])
DF1, DF2
threshold = 5000 # meters
earth_radius = 6371000  # meters
DF1['nearby'] = (
# get the distance between all points of each DF
haversine_distances(
    # note that you need to convert to radiant with *np.pi/180
    X=DF1[['Latitude1','Longitude1']].to_numpy()*np.pi/180, 
    Y=DF2[['Latitude2','Longitude2']].to_numpy()*np.pi/180)
*earth_radius < threshold).any(axis=1).astype(int)

DF3 = DF1.join(DF2[['BT']])
DF3 = DF3[DF3['nearby'].eq(1)].drop('nearby', axis=1)
DF3

Output:

Out[1]: 
   Longitude1  Latitude1  Echo_top_height          BT
2   19.081514 -17.134456             8121  285.349994

CodePudding user response：

You can use BallTree:

from sklearn.neighbors import BallTree, DistanceMetric

# DF1
coords = np.radians(df2[['Latitude2', 'Longitude2']])
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)

# DF2
coords = np.radians(df1[['Latitude1', 'Longitude1']])
distances, indices = tree.query(coords, k=1)
df1['BT'] = df2['BT'].iloc[indices.flatten()].values
df1['Distance'] = distances.flatten()

Output:

Longitude1	Latitude1	Echo_top_height	BT	Distance
19.8277	-20.3722	8614	284.35	0.0572097
19.8254	-20.3626	7412	284.35	0.0570377
19.0815	-17.1345	8121	285.32	0.000294681