Home > OS >  Calculate distance b/w two data frames and result into a cross distance matrix and find nearest loca
Calculate distance b/w two data frames and result into a cross distance matrix and find nearest loca

Time:09-22

The data is as follows:

import pandas as pd
city_data = {'City': ['Delhi', 'Mumbai'],
        'Lat': [28.7041, 19.0760,],
        'Long':[77.1025,72.8777] }
person_data = {'City': ['A', 'B'],
        'Lat': [12.9716, 13.0827,],
        'Long':[77.5946,80.2707] }
df_city = pd.DataFrame(city_data)
df_person = pd.DataFrame(person_data)

Output-1 Needed enter image description here

Output-2 Needed enter image description here

The distance took haversine distance calculation. There are 1000 people and 300 locations

CodePudding user response:

Here's a way of doing this using scipy.metrics.pairwise.haversine_distaces, which does a pairwise-computation for each pair of coordinates. Note that I added another city so that the two lists would be different sizes, just to make sure the array coordinates were in the right order:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import haversine_distances

city_data = {'City': ['Delhi', 'Mumbai','Jakarta'],
        'Lat': [28.7041, 19.0760,6.175],
        'Long':[77.1025,72.8777,106.8275] }
person_data = {'Person': ['A', 'B'],
        'Lat': [12.9716, 13.0827,],
        'Long':[77.5946,80.2707] }
df_city = pd.DataFrame(city_data)
df_person = pd.DataFrame(person_data)
print(df_city)
print()
print(df_person)
print()

# Extract as arrays and convert to radians.

c1 = np.radians(df_city[['Lat','Long']].to_numpy())
c2 = np.radians(df_person[['Lat','Long']].to_numpy())

# Compute distances in kilometers.

dist = haversine_distances(c2, c1) * 6371000/1000
print(dist)
print()


# Convert back to dataframe.

df = pd.DataFrame( dist, columns=df_city['City'], index=df_person['Person'])
print(df)
print()

# Sort the data and return the indexes of the closest two.

distsort = dist.argsort(axis=1)[:,:2]

# Look those up in the city names.

distsort = df_city['City'].to_numpy()[distsort]

df2 = pd.DataFrame( distsort, columns=['Closest','Next'], index=df_person['Person'])
print(df2)

Output:

      City      Lat      Long
0    Delhi  28.7041   77.1025
1   Mumbai  19.0760   72.8777
2  Jakarta   6.1750  106.8275

  Person      Lat     Long
0      A  12.9716  77.5946
1      B  13.0827  80.2707

[[1750.11476241  845.31838566 3290.21557611]
 [1767.65141115 1033.09851229 3008.41699612]]

City          Delhi       Mumbai      Jakarta
Person                                       
A       1750.114762   845.318386  3290.215576
B       1767.651411  1033.098512  3008.416996

       Closest   Next
Person               
A       Mumbai  Delhi
B       Mumbai  Delhi
  • Related