Home > OS >  Does the address already exist, if so, use the known data
Does the address already exist, if so, use the known data

Time:08-10

I have a problem. I want to get the coordinates long and lat from the address. This works as far as it goes, but I may have multiple addresses. Is there a way to check directly in the method whether this address already has a long and lat value and if so, should this be taken and not queried again via geolocator.geocode(df['address'])?

Dataframe

                                         address  customer
0              Surlej, 7513, Silvaplana, Schweiz         1
1  Vodnikova cesta 35, 1000 Ljubljana, Slowenien         2
2              Surlej, 7513, Silvaplana, Schweiz         1

Code

import pandas as pd
d = {
    "address": ['Surlej, 7513, Silvaplana, Schweiz', 
                'Vodnikova cesta 35, 1000 Ljubljana, Slowenien', 'Surlej, 7513, Silvaplana, Schweiz',],
    "customer": [1, 2, 1],
}
df = pd.DataFrame(data=d)
print(df)

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='testing_stackoverflow')
def adressToLatLong(df):
    
    location = geolocator.geocode(df['address'])
    #print(location)
    if (location == None):
        df['address'] = None
        df['address'] = None
    else:
        df['latitude'] = location.latitude
        df['longitude'] = location.longitude
    return df   

df = df.apply(lambda  x: adressToLatLong(x), axis=1)

What I want

                                         address  customer   latitude  \
0              Surlej, 7513, Silvaplana, Schweiz         1  46.459902   
1  Vodnikova cesta 35, 1000 Ljubljana, Slowenien         2  46.065523   
2              Surlej, 7513, Silvaplana, Schweiz         1  46.459902   

   longitude  
0   9.803370  
1  14.490775  
2   9.803370  

I tought you could create a dict and merged it later to the df

def adressToLatLong(addresses):
    adresses_new = {}
    for address in addresses:
      location = geolocator.geocode(df['address'])
      #print(location)
      if (location == None):
          df['address'] = None
          df['address'] = None
      else:
        adresses_new.update({'adress': address, 'long': location.longitude, 'lat': location.latitude})   
    return adresses_new
adressToLatLong(df['address'].unique())

[OUT] {}

CodePudding user response:

Use a cache. For this you must write a function that takes an address (as string) as input, and outputs the lat/lon:

from functools import cache

@cache
def function_that_returns_lat_lon_from_address(address):
    # ...
    return (lat, lon) 

df[['lat', 'lon']] = df['address'].apply(function_that_returns_lat_lon_from_address, result_type='expand')

for python < 3.9, use lru_cache:

from functools import lru_cache

@lru_cache(maxsize=None)
def function_that_returns_lat_lon_from_address(address):
    ...
  • Related