df1:
id | latlong_tuple |
---|---|
364 | (17.3946820179646, 78.042644262359) |
365 | (17.3945761480423, 78.0427466415321) |
1085 | (17.3950200947952, 78.0432334533569) |
1086 | (17.3947638830589, 78.0430426797909) |
1087 | (17.3945460707558, 78.0430666916614) |
df2
index | latlong_tuple |
---|---|
01 | (17.431952, 78.37396) |
02 | (17.48295, 78.306694) |
03 | (17.479734, 78.34914) |
04 | (17.368366, 78.38604) |
05 | (17.433102, 78.37506) |
def tileId_mapping(sample_cord, tile_cord, tile):
result = []
for i in tqdm(range(0, len(sample_cord))):
dis_list=[]
for j in range(0, len(tile_cord)):
dis = hs.haversine(sample_cord[i], tile_cord[j], unit=Unit.METERS)
dis_list.append(dis)
shortest_dis = min(dis_list)
min_index = dis_list.index(shortest_dis)
result.append(id_tile[min_index])
return result
This code is too slow to when the size of df1 is 320096 and df2 is 5299669. Can someone please help me to make it faster ?
Thanks in advance.
CodePudding user response:
Maybe you can use haversine
like in this answer
First of all, you should have two columns, one for your latitude and the other one for your longitute.
You could use something like this:
df1[['LAT', 'LONG']] = pd.DataFrame(df1['latlong_tuple'].to_list())
# same for the second dataframe
df2[['LAT', 'LONG']] = pd.DataFrame(df2['latlong_tuple'].to_list())
Then, compute the vectorized haversine function like so:
# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
"""
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df1['dist'] = haversine(df1.LAT.shift(), df1.LONG.shift(), df2.LAT.shift(), df2.LONG.shift())
CodePudding user response:
Or use this package : https://pypi.org/project/pyhaversine/ optimized with pybind11