Need of Optimization of shortest distance calculation in python-CodePudding

df1:

id	latlong_tuple
364	(17.3946820179646, 78.042644262359)
365	(17.3945761480423, 78.0427466415321)
1085	(17.3950200947952, 78.0432334533569)
1086	(17.3947638830589, 78.0430426797909)
1087	(17.3945460707558, 78.0430666916614)

df2

index	latlong_tuple
01	(17.431952, 78.37396)
02	(17.48295, 78.306694)
03	(17.479734, 78.34914)
04	(17.368366, 78.38604)
05	(17.433102, 78.37506)

def tileId_mapping(sample_cord, tile_cord, tile):
    result = []
    for i in tqdm(range(0, len(sample_cord))):
        dis_list=[]
        for j in range(0, len(tile_cord)):
            dis = hs.haversine(sample_cord[i], tile_cord[j], unit=Unit.METERS)
            dis_list.append(dis)
        shortest_dis = min(dis_list)
        min_index = dis_list.index(shortest_dis)
        result.append(id_tile[min_index])
    return result

This code is too slow to when the size of df1 is 320096 and df2 is 5299669. Can someone please help me to make it faster ?

Thanks in advance.

CodePudding user response：

Maybe you can use haversine like in this answer

First of all, you should have two columns, one for your latitude and the other one for your longitute.

You could use something like this:

df1[['LAT', 'LONG']] = pd.DataFrame(df1['latlong_tuple'].to_list())

# same for the second dataframe
df2[['LAT', 'LONG']] = pd.DataFrame(df2['latlong_tuple'].to_list())

Then, compute the vectorized haversine function like so:

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    All (lat, lon) coordinates must have numeric dtypes and be of equal length.
    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
    
    a = np.sin((lat2-lat1)/2.0)**2   \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
    
    return earth_radius * 2 * np.arcsin(np.sqrt(a))


df1['dist'] = haversine(df1.LAT.shift(), df1.LONG.shift(), df2.LAT.shift(), df2.LONG.shift())

CodePudding user response：

Or use this package : https://pypi.org/project/pyhaversine/ optimized with pybind11