Home > database >  Iterrows should not fill in all rows in a column
Iterrows should not fill in all rows in a column

Time:05-05

I am looking for a way that prevents that all cell values in a column get filled with the same values.

I have a list with different locations and want to extract their geo coordinates (longitude and latitude). Iterrowing through each row unfortunately not only changes the cell in the current row but all rows of the columns "ADDRESS_LAT" and "ADDRESS_LONG" with each iterration. The cell values are the same although locations are different.

How can I fill in only the cell in the respective row? Is iterrows the wrong approach, does it require a function that gets called with .apply()?

Any hints or solutions much appreciated!

import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
from geopy import distance


data = {'Address': ["60311, Germany", "23769, Germany"]}
df = pd.DataFrame(data)
print(df)

geolocator = Nominatim(user_agent="myapp")

# iterate through geocode input
for index, row in df.iterrows():
    location_geocode = geolocator.geocode(row['Address'])

    # create new colums to store geocode latitude and longitude
    df['ADDRESS_LAT'] = location_geocode.latitude
    df['ADDRESS_LONG'] = location_geocode.longitude

    # Problem: all cell values get same values
    print(df)

OUTPUT:

          Address
0  60311, Germany
1  23769, Germany
          Address  ADDRESS_LAT  ADDRESS_LONG
0  60311, Germany    50.111296       8.68318
1  23769, Germany    50.111296       8.68318
          Address  ADDRESS_LAT  ADDRESS_LONG
0  60311, Germany    54.448025     11.168252
1  23769, Germany    54.448025     11.168252

CodePudding user response:

If you have to use iterrows() then you can put values on list and later convert list to column

# --- before loop ---

col_lat  = []
col_long = []

# --- loop ---

for index, row in df.iterrows():
    location_geocode = geolocator.geocode(row['Address'])
    col_lat.append(location_geocode.latitude)
    col_long.append(location_geocode.longitude)

# --- after loop ---

df['ADDRESS_LAT']  = col_lat
df['ADDRESS_LONG'] = col_long

print(df)

Or you should first create columns and later use row instead of df

# ---- before loop ---

df['ADDRESS_LAT']  = '?'  # default value at start
df['ADDRESS_LONG'] = '?'  # default value at start

# --- loop ---

for index, row in df.iterrows():
    location_geocode = geolocator.geocode(row['Address'])
    row['ADDRESS_LAT']  = location_geocode.latitude
    row['ADDRESS_LONG'] = location_geocode.longitude

# --- after loop ---

print(df)

But you can also use .apply()

def convert(row):
    location_geocode = geolocator.geocode(row['Address'])
    return pd.Series([location_geocode.latitude, location_geocode.longitude])

df[ ['ADDRESS_LAT', 'ADDRESS_LONG'] ] = df.apply(convert, axis=1)

print(df)

Or

def convert(row):
    location_geocode = geolocator.geocode(row['Address'])
    row['ADDRESS_LAT']  = location_geocode.latitude
    row['ADDRESS_LONG'] = location_geocode.longitude
    return row

df = df.apply(convert, axis=1)

print(df)
  • Related