I am calculating commute distances (home to offices) for all employees. This works for a single office with this piece of code:
#Calculate distances from home to all other offices
def distances_to_offices(row):
row['distance_to_office'] = round(distance.distance(row['Home_geocode'], row['Office_geocode']).km,0)
return row
df_joined.apply(distances_to_offices, axis=1)
I have many offices though and would like to for loop through them, creating a new column for distances for each office. The function is called without an argument but "row" is set as parameter in the function definition. When i try to pass a city name, I need to call the function with the same number of arguments like in the function definition, yet that is not working out as "row" is not understood as argument:
NameError: name 'row' is not defined
I don´t understand why "row" works as argument in the function definition but not when I try to call it with that argument. Who can help shedding some light? I am thinking of something like this but struggle with chosing the right arguments:
# throws Name Error:
def distances_to_offices(row, city):
col_name = city "_distance"
row[col_name] = round(distance.distance(row['Home_geocode'], row['Office_geocode']).km,0)
return row
offices = ['NY', 'Rio', 'Tokyo']
for city in offices:
df_joined.apply(distances_to_offices(row, city), axis=1)
CodePudding user response:
Can't say for sure because I don't know what df_joined is, but it looks like it is some object representing a data table on which you can call apply()
passing in a function, and apply()
will loop through the table, calling the function you have supplied while passing in a row index or name for every row of the data table.
The first way you did it works because you are passing the NAME of the function distances_to_offices
to apply(), and apply() is calling it with a row
which gets defined in some for loop that it sets up.
The second way you did it fails because you are no longer passing the NAME of distances_to_offices
, but instead CALLING it with arguments row
and city
, then passing the result of that function call as an argument to apply()
. That can't work because you haven't defined what row
is, but Python needs to know what row
is in order to do the function call distances_to_offices(row, city)
, which happens BEFORE apply sees it and has an opportunity to define row
.
I'm not sure how df_joined.apply() works, so without more information I can't say the right way to do this. One strategy would be to create a new data table object in each iteration of the loop by filtering df_joined to retain only rows that have a city matching city
.