Home > front end >  Bulk process a number of latitude and longitude values from a dataframe and make a new column from t
Bulk process a number of latitude and longitude values from a dataframe and make a new column from t

Time:01-05

In my Dataframe I have two columns of latitude and longitude. I want to use these two columns to calculate my test_url column for getting the country inside it.

I'm using the Nominatim OpenStreetMap api url for this.

My imports:

import pandas as pd
import requests

My check_country function:

def check_country(url):
    
    r = requests.get(url)
    results = r.json()['address']
    
    return results['country']

Column calculation:

df['test_url'] = df[['latitude','longitude']].apply(lambda x : check_country(f"https://nominatim.openstreetmap.org/reverse?lat={x[0]}&lon={x[1]}&format=json"),axis=1)

But with this I'm getting a connection error.

Error

ConnectionError: 

HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /reverse?lat=10.75161&lon=77.11299&format=json 

(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002262FC94C40>: 

Failed to establish a new connection:

[WinError 10061] No connection could be made because the target machine actively refused it'

))

CodePudding user response:

You can use GeoPandas and use the "World Administrative Boundaries" dataset to make local requests. First step is to ownload the GeoJSON file and install geopandas then:

# Python env: pip install geopandas
# Anaconda env: conda install geopandas

import geopandas as gpd
from shapely.geometry import Point

gdf = gpd.read_file('world-administrative-boundaries.geojson')
p = Point(77.11299, 10.75161)

out = gdf.loc[gdf.intersects(p), 'name']
print(out)

# Output:
226    India
Name: name, dtype: object

Advanced usage: Multiple coordinates:

coords = [(40.730610, -73.935242), (10.75161, 77.11299)]
points = [Point(lon, lat) for lat, lon in  coords]
dfp = gpd.GeoDataFrame({'geometry': points}, crs=gdf.crs)
out = gpd.sjoin(dfp, gdf, predicate='within')
print(out)

# Output
                     geometry  index_right           french_short iso3        status iso_3166_1_alpha_2_codes                      name            region color_code continent
0  POINT (-73.93524 40.73061)          182  États-Unis d'Amérique  USA  Member State                       US  United States of America  Northern America        USA  Americas
1   POINT (77.11299 10.75161)          226                   Inde  IND  Member State                       IN                     India     Southern Asia        IND      Asia
  •  Tags:  
  • Related