Home > Blockchain >  How to use haversine distance using haversine library on pandas dataframe
How to use haversine distance using haversine library on pandas dataframe

Time:04-05

Here's using how I use haversine library to calculate distance between two points

import haversine as hs
hs.haversine((106.11333888888888,-1.94091666666667),(96.698661, 5.204783))

Here's how to calculate haversine distance using sklearn

from sklearn.metrics.pairwise import haversine_distances
import numpy as np
radian_1 = np.radians(df1[['lat','lon']])
radian_2 = np.radians(df2[['lat','lon']])
D = pd.DataFrame(haversine_distances(radian_1,radian_2)*6371,index=df1.index, columns=df2.index)

What i need is doing similar things but instead using sklearn.metrics.pairwise library, I use haversine library

Here's my dataset df1

   index       lon        lat
0   0   107.071969  -6.347778
1   1   110.431361  -7.773489
2   2   111.978469  -8.065442

and dataset df2

    index      lon        lat
5   5   112.340919  -7.520442
6   6   107.179119  -6.291131
7   7   106.807442  -6.437383

Here's expected output

        5           6           7
    0  596.019968   13.413123   30.882602
    1  212.317223  394.942014  426.564799
    2   72.573637  565.020998  598.409848

CodePudding user response:

You can use itertools.product for creating all cases then use haversine for getting results like the below:

import haversine as hs
import pandas as pd
import numpy as np
import itertools

res = []
for a,b in (itertools.product(*[df1.values , df2.values])):
    res.append(hs.haversine(a,b))

m = int(np.sqrt(len(res)))
df = pd.DataFrame(np.asarray(res).reshape(m,m))
print(df)

Output:

            0           1           2
0  587.500555   12.058061   29.557005
1  212.580742  365.487782  405.718803
2   46.333180  537.684789  578.072579

CodePudding user response:

Following the documentation and example found on: sklearn.metrics.haversine

result = haversine_distances(np.radians(df_1[["lat","lon"]]), np.radians(df_2[["lat", "lon"]])) * 6371000/1000
result_df = pd.DataFrame(result, index = df_1["index"], columns=df_2["index"])

<table border="1" >
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>5</th>
      <th>6</th>
      <th>7</th> </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th> </tr> </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>596.019968</td>
      <td>13.413123</td>
      <td>30.882602</td> </tr>
    <tr>
      <th>1</th>
      <td>212.317223</td>
      <td>394.942014</td>
      <td>426.564799</td> </tr>
    <tr>
      <th>2</th>
      <td>72.573637</td>
      <td>565.020998</td>
      <td>598.409848</td> </tr> </tbody> </table>

You first need to convert the latitude and longitude to radians, and once you get back the distance you need to multiply by the earth radius to get the correct distance.

  • Related