Home > Back-end >  How to get city, state, and country from a list of latitude and longitude coordinates?
How to get city, state, and country from a list of latitude and longitude coordinates?

Time:10-02

I have a 500,000 list of latitudes and longitudes coordinates like below:

Latitude   Longitude  
42.022506  -88.168156  
41.877445  -87.723846  
29.986801  -90.166314  

I am looking to use python to get the city, state, and country for each coordinate in a new column like below:

Latitude   Longitude   City        State   Country
42.022506  -88.168156  Streamwood  IL      United States
41.877445  -87.723846  Chicago     IL      United States
29.986801  -90.166314  Metairie    LA      United States

With this large of a dataset, how can this be achieved in python? I have heard of Google's API, Nominatim's API, and Geopy package.

How do I get to run through all of the rows into this code? Right now I have to manually input the latitude and longitude into the last line.

import csv 
import pandas as pd
import numpy as np
import math
from geopy.geocoders import Nominatim

input_file = "Lat-Log.csv" # file contains ID, Latitude, Longitude
output_file = "output.csv"
df = pd.read_csv(input_file) 

geolocator = Nominatim(user_agent="geoapiExercises")
def city_state_country(coord):
    location = geolocator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    city = address.get('city', '')
    state = address.get('state', '')
    country = address.get('country', '')
    return city, state, country
print(city_state_country("47.470706, -99.704723"))

The output gives me ('Bowdon', 'North Dakota', 'USA'). I am looking to replace the coordinates with my columns (latitude and longitude) to run through my list. How do I input my columns into the code to run through the whole document?

CodePudding user response:

You want to run a function on each row, which can be done using apply().

There are two complications, which is that you want to 1) provide multiple arguments to the function, and 2) get back multiple results.

These questions explain how to do those things:

Here's how to adapt your code to do this:

import pandas as pd
import io
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="geoapiExercises")

s = """Latitude   Longitude  
42.022506  -88.168156  
41.877445  -87.723846  
29.986801  -90.166314"""

df = pd.read_csv(io.StringIO(s), delim_whitespace=True)

def city_state_country(row):
    coord = f"{row['Latitude']}, {row['Longitude']}"
    location = geolocator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    city = address.get('city', '')
    state = address.get('state', '')
    country = address.get('country', '')
    row['city'] = city
    row['state'] = state
    row['country'] = country
    return row

df = df.apply(city_state_country, axis=1)
print(df)

(I replaced your read_csv() call with an inline definition of the dataframe. Ignore that. It's not important to the example. I did that to make the example self-contained.)

The city_state_country() function gets called with every row of the dataframe. (The axis=1 argument makes apply() run using rows rather than columns.) The function gets the lat and lon, and does a query. Then, it modifies the row to include the information from the query.

This gets the following result:

    Latitude  Longitude     city      state        country
0  42.022506 -88.168156            Illinois  United States
1  41.877445 -87.723846  Chicago   Illinois  United States
2  29.986801 -90.166314           Louisiana  United States

Not the same as your example, but Nominatim doesn't seem to return a city for two your your coordinates. (It calls them towns, not cities.)

  • Related