Home > Software engineering >  Python get state district and state column for each lat long coordinate in Geopy
Python get state district and state column for each lat long coordinate in Geopy

Time:02-24

I have a list of 200 latitude and longitude coordinate pairs.

For each coordinate pair I want to create a dataframe which contains column district and column state. So my dataframe will have 3 columns cord, district and state.

For this I am using geopy library but I am unable to get record for more than 115 coordinates.

Sample Data

    cord
0   (19.4, 17.93)
1   (55.54, 93.93)
2   (52.45, 78.93)
3   (65.54, 67.93)
4   (47.74, 99.93)


Required Output Demo

    cord        district    state
0   (19.4, 17.93)   xyz      aaa
1   (55.54, 93.93)  adc      aaa
2   (52.45, 78.93)  gyu      drt
3   (65.54, 67.93)  www      bhn
4   (47.74, 99.93)  ccf      bvg


I have tried this code but unable to get fetch details for more than 115 queries.

from geopy.geocoders import Nominatim
district = {} # Initialize empty dict
geo_loc # List containing all the codrinates in this format (lat, long)
for cord in geo_loc:
    geolocator = Nominatim(user_agent='user_agent')
    location = geolocator.reverse(cord, addressdetails=True)
    district[cord] = location.raw['address']['state_district']


I need to fetch maximum of 500 unique coordinates at one time.
Also I need district and state name both in separate columns.

CodePudding user response:

From Nominatim Usage Policy they require not to do heavy usage i.e. maximum 1 request per second. "No heavy uses (an absolute maximum of 1 request per second)." You can use geopy's RateLimiter to send 1 request per second. I've tested the following code works for more than 115 requests:

from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim
import pandas as pd
geolocator = Nominatim(user_agent="user_agent")
# add rate limit
reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
state_list = [] # Initialize empty dict
# create dataframe
df = pd.DataFrame({"geo_loc"  :[(19.4, 17.93), (55.54, 93.93),(52.45, 78.93),  (65.54, 67.93),  (47.74, 99.93) ]})
# get location coordinates
geo_loc  = df.geo_loc.values
for cord in geo_loc:
    # send request
    location = reverse(cord, addressdetails=True)
    # get state value
    state = location.raw["address"].get("state")
    # store state value
    state_list.append(state)
# assign back states
df['states'] = state_list
print(df)

Resulting dataframe:

        geo_loc                           states
0   (19.4, 17.93)                   Tibesti تيبستي
1  (55.54, 93.93)                Красноярский край
2  (52.45, 78.93)                   Алтайский край
3  (65.54, 67.93)  Ямало-Ненецкий автономный округ
4  (47.74, 99.93)                         Архангай
  • Related