I have a problem. I want to get the coordinates long
and lat
from the address. This works as far as it goes, but I may have multiple addresses. Is there a way to check directly in the method whether this address already has a long
and lat
value and if so, should this be taken and not queried again via geolocator.geocode(df['address'])
?
Dataframe
address customer
0 Surlej, 7513, Silvaplana, Schweiz 1
1 Vodnikova cesta 35, 1000 Ljubljana, Slowenien 2
2 Surlej, 7513, Silvaplana, Schweiz 1
Code
import pandas as pd
d = {
"address": ['Surlej, 7513, Silvaplana, Schweiz',
'Vodnikova cesta 35, 1000 Ljubljana, Slowenien', 'Surlej, 7513, Silvaplana, Schweiz',],
"customer": [1, 2, 1],
}
df = pd.DataFrame(data=d)
print(df)
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='testing_stackoverflow')
def adressToLatLong(df):
location = geolocator.geocode(df['address'])
#print(location)
if (location == None):
df['address'] = None
df['address'] = None
else:
df['latitude'] = location.latitude
df['longitude'] = location.longitude
return df
df = df.apply(lambda x: adressToLatLong(x), axis=1)
What I want
address customer latitude \
0 Surlej, 7513, Silvaplana, Schweiz 1 46.459902
1 Vodnikova cesta 35, 1000 Ljubljana, Slowenien 2 46.065523
2 Surlej, 7513, Silvaplana, Schweiz 1 46.459902
longitude
0 9.803370
1 14.490775
2 9.803370
I tought you could create a dict
and merged it later to the df
def adressToLatLong(addresses):
adresses_new = {}
for address in addresses:
location = geolocator.geocode(df['address'])
#print(location)
if (location == None):
df['address'] = None
df['address'] = None
else:
adresses_new.update({'adress': address, 'long': location.longitude, 'lat': location.latitude})
return adresses_new
adressToLatLong(df['address'].unique())
[OUT] {}
CodePudding user response:
Use a cache
. For this you must write a function that takes an address (as string) as input, and outputs the lat/lon:
from functools import cache
@cache
def function_that_returns_lat_lon_from_address(address):
# ...
return (lat, lon)
df[['lat', 'lon']] = df['address'].apply(function_that_returns_lat_lon_from_address, result_type='expand')
for python < 3.9, use lru_cache
:
from functools import lru_cache
@lru_cache(maxsize=None)
def function_that_returns_lat_lon_from_address(address):
...