I have lat, longs and addresses in a pandas dataframe. A user inputs an address and I'd like to lookup the details associated from pandas dataframe based on the lat, long. Here's my code:
import pandas as pd
df_geo = pd.DataFrame({'Address': ['Addr1','Addr2','Addr3'],
'Value': [100, 101, 103],
'Lat': [33.515226, 33.51529, 33.515230],
'Long': [-112.094456, -112.094459, -112.094464]})
I geocode the address using an API and obtain a list of lat, long.
[33.515227, -112.094457]
How do I find the intersection or nearest coordinates in pandas dataframe and pull Address
and Value
fields? We have the geocoding API. Pandas DataFrame can be fairly large, so looking for an efficient solution using one of the python geo libraries, if possible.
CodePudding user response:
Use BallTree
from sklearn
:
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
df_geo = pd.DataFrame({'Address': ['Addr1','Addr2','Addr3'],
'Value': [100, 101, 103],
'Lat': [33.515226, 33.51529, 33.515230],
'Long': [-112.094456, -112.094459, -112.094464]})
coords = [33.515227, -112.094457]
X = np.deg2rad(df_geo[['Lat', 'Long']].values)
y = np.deg2rad(np.array([coords]))
tree = BallTree(X, leaf_size=2)
dist, ind = tree.query(y)
Output:
>>> df_geo[['Address', 'Value']].iloc[ind[0][0]].tolist()
['Addr1', 100]
>>> dist
array([[2.46826831e-08]])
>>> ind
array([[0]])
CodePudding user response:
IIUC, use numpy.isclose
. Since all the values are really close, below solution will pull all records.
In [862]: import numpy as np
In [863]: lat_long = [33.515227, -112.094457]
In [870]: df_geo[np.isclose(df_geo[['Lat', 'Long']], lat_long)].drop_duplicates()[['Address', 'Value']]
Out[870]:
Address Value
0 Addr1 100
1 Addr2 101
2 Addr3 103