Home > database >  How to map closest place by lat-long from one-to-many
How to map closest place by lat-long from one-to-many

Time:06-09

Here's my code

import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree, DistanceMetric

# df1
N = 2000
df1 = pd.DataFrame({'name': 'name'   pd.RangeIndex(1, N 1).astype(str),
                    'lat': np.random.uniform(30, 65, N),
                    'lon': np.random.uniform(-150, -70, N)})


# df2
N = 25000
df2 = pd.DataFrame({'sitename': 'site'   pd.RangeIndex(1, N 1).astype(str),
                    'lat': np.random.uniform(30, 65, N),
                    'lon': np.random.uniform(-150, -70, N)})


# bts
coords = np.radians(df1[['lat', 'lon']])
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)

# airport_hostpital_1
coords = np.radians(df2[['lat', 'lon']])
distances, indices = tree.query(coords, k=1)

df1['sitename'] = df2.iloc[indices.ravel()]['sitename'].values

Here's my output

ValueError                                Traceback (most recent call last)
<ipython-input-4-7d20f0c5d9d7> in <module>
     24 distances, indices = tree.query(coords, k=1)
     25 
---> 26 df1['sitename'] = df2.iloc[indices.ravel()]['sitename'].values

ValueError: Length of values (25000) does not match length of index (2000)

My Expected output

    sitename    lat         lon             name
0   site1       46.079246   -105.782183     name1209
1   site2       49.243516   -95.104086      name1091
2   site3       63.956400   -89.549558      name91

CodePudding user response:

From your previous question and my answer, you have swapped the dataframes. To fix your code:

import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree, DistanceMetric

# df1
N = 2000
df1 = pd.DataFrame({'name': 'name'   pd.RangeIndex(1, N 1).astype(str),
                    'lat': np.random.uniform(30, 65, N),
                    'lon': np.random.uniform(-150, -70, N)})


# df2
N = 25000
df2 = pd.DataFrame({'sitename': 'site'   pd.RangeIndex(1, N 1).astype(str),
                    'lat': np.random.uniform(30, 65, N),
                    'lon': np.random.uniform(-150, -70, N)})


# bts
coords = np.radians(df2[['lat', 'lon']])  # HERE df1 -> df2
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(coords, metric=dist)

# airport_hostpital_1
coords = np.radians(df1[['lat', 'lon']])  # HERE df2 -> df1
distances, indices = tree.query(coords, k=1)

df1['sitename'] = df2.iloc[indices.ravel()]['sitename'].values

Output:

>>> df1
          name        lat         lon   sitename
0        name1  42.263207 -118.243787  site16231
1        name2  33.034391 -134.604954  site11275
2        name3  30.370661  -90.828936  site12107
3        name4  57.250977 -102.941565  site12079
4        name5  45.296180  -80.000868  site17749
...        ...        ...         ...        ...
1995  name1996  35.359411  -87.820709   site5675
1996  name1997  57.476931  -79.979884   site6402
1997  name1998  46.141786 -119.306523    site554
1998  name1999  49.551388  -86.893896   site8452
1999  name2000  55.836713  -76.379846   site5976

[2000 rows x 4 columns]
  • Related