I have a 500,000 list of latitudes and longitudes coordinates like below:
Latitude Longitude
42.022506 -88.168156
41.877445 -87.723846
29.986801 -90.166314
I am looking to use python to get the city, state, and country for each coordinate in a new column like below:
Latitude Longitude City State Country
42.022506 -88.168156 Streamwood IL United States
41.877445 -87.723846 Chicago IL United States
29.986801 -90.166314 Metairie LA United States
With this large of a dataset, how can this be achieved in python? I have heard of Google's API, Nominatim's API, and Geopy package.
How do I get to run through all of the rows into this code? Right now I have to manually input the latitude and longitude into the last line.
import csv
import pandas as pd
import numpy as np
import math
from geopy.geocoders import Nominatim
input_file = "Lat-Log.csv" # file contains ID, Latitude, Longitude
output_file = "output.csv"
df = pd.read_csv(input_file)
geolocator = Nominatim(user_agent="geoapiExercises")
def city_state_country(coord):
location = geolocator.reverse(coord, exactly_one=True)
address = location.raw['address']
city = address.get('city', '')
state = address.get('state', '')
country = address.get('country', '')
return city, state, country
print(city_state_country("47.470706, -99.704723"))
The output gives me ('Bowdon', 'North Dakota', 'USA'). I am looking to replace the coordinates with my columns (latitude and longitude) to run through my list. How do I input my columns into the code to run through the whole document?
CodePudding user response:
You want to run a function on each row, which can be done using apply().
There are two complications, which is that you want to 1) provide multiple arguments to the function, and 2) get back multiple results.
These questions explain how to do those things:
- python pandas- apply function with two arguments to columns
- Return multiple columns from pandas apply()
Here's how to adapt your code to do this:
import pandas as pd
import io
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="geoapiExercises")
s = """Latitude Longitude
42.022506 -88.168156
41.877445 -87.723846
29.986801 -90.166314"""
df = pd.read_csv(io.StringIO(s), delim_whitespace=True)
def city_state_country(row):
coord = f"{row['Latitude']}, {row['Longitude']}"
location = geolocator.reverse(coord, exactly_one=True)
address = location.raw['address']
city = address.get('city', '')
state = address.get('state', '')
country = address.get('country', '')
row['city'] = city
row['state'] = state
row['country'] = country
return row
df = df.apply(city_state_country, axis=1)
print(df)
(I replaced your read_csv() call with an inline definition of the dataframe. Ignore that. It's not important to the example. I did that to make the example self-contained.)
The city_state_country()
function gets called with every row of the dataframe. (The axis=1
argument makes apply() run using rows rather than columns.) The function gets the lat and lon, and does a query. Then, it modifies the row to include the information from the query.
This gets the following result:
Latitude Longitude city state country
0 42.022506 -88.168156 Illinois United States
1 41.877445 -87.723846 Chicago Illinois United States
2 29.986801 -90.166314 Louisiana United States
Not the same as your example, but Nominatim doesn't seem to return a city for two your your coordinates. (It calls them towns, not cities.)