im new at python, and try to categorize places an df1 by distance to places in df2, but something gonna wrong
i have 2 dataframes whith coordinate of places
import pandas as pd
import geopy.distance
df1 = pd.DataFrame([['a', 55.88, 37.48],
['b', 55.88, 37.53],
['c', 55.89, 37.45]],
columns=['name', 'lat', 'lng']
df1 = pd.DataFrame([['f', 55.81, 37.12],
['g', 55.79, 37.23],
['h', 55.23, 37.21]],
columns=['name', 'lat', 'lng']
print(df1)
print(df2)
df1
name | lat | lng |
---|---|---|
a | 55.88 | 37.48 |
b | 55.88 | 37.53 |
c | 55.89 | 37.45 |
df2
name | lat | lng |
---|---|---|
f | 55.81 | 37.12 |
g | 55.79 | 37.23 |
h | 55.23 | 37.21 |
so, i try to calculate distance between a and f,g,h and if distance to one of this place less than 1000m, append category "close" and else category 'far', and do it for each name in df1
i want this df
print(df1)
name | lat | lng | dist_to_palce |
---|---|---|---|
a | 55.88 | 37.48 | far |
b | 55.88 | 37.53 | close |
c | 55.89 | 37.45 | far |
i try this construction
def dist(df1):
for i in range(len(df1)):
for j in range(len(df2)):
if geopy.distance.geodesic(
tuple(data[['lat','lng']].iloc[i]),
tuple(metro[['lat','lng']].iloc[j])).m <1000:
return 'close'
else: return 'far'
df1['dist_to_place'] = df1.apply(dist, axis=1)
but i got error 'float' object is not iterable
help me please :C
solution
def dist(df1_row):
for j in range(len(df2)):
if geopy.distance.geodesic(
tuple(df1_row[['lat','lng']]),
tuple(df2[['lat','lng']].iloc[j])).m <1000:
return 'close'
return 'far'
df1['dist_to_place'] = df1.apply(dist, axis=1)
CodePudding user response:
import pandas as pd
import geopy.distance
df1 = pd.DataFrame({'name':['a', 'b'],'lat':[56.34, 76.56], 'lng':[23.42, 45.34]})
df2 = pd.DataFrame({'name':['f', 'g'],'lat':[56.45, 76.55], 'lng':[27.42, 40.34]})
def dist(df1_row):
for j in range(len(df2)):
if geopy.distance.geodesic(
tuple(df1_row[['lat','lng']]),
tuple(df2[['lat','lng']].iloc[j])).m <1000:
return 'close'
return 'far'
df1['dist_to_place'] = df1.apply(dist, axis=1)
- when you use apply, row is given to your function, not the whole dataframe
- you need to return
far
after the cycle, not inside
CodePudding user response:
The error is happening because you are returning a string value 'close' or 'far' from the dist function and trying to assign it to the entire row of the df1 dataframe using df1.apply(dist, axis=1). Instead of returning a string, you should create a list of values with length equal to the number of rows in df1, and then assign the list to a new column in df1.
import geopy.distance
def dist(row):
result = []
for j in range(len(df2)):
if geopy.distance.geodesic(
(row['lat'], row['lng']),
(df2.loc[j, 'lat'], df2.loc[j, 'lng'])).m < 1000:
result.append('close')
else: result.append('far')
return result
df1['dist_to_place'] = df1.apply(dist, axis=1).apply(lambda x: x[0])
above code calculates the distance between each place in df1 and all the places in df2, but only the first occurrence of 'close' or 'far' is returned and assigned to the new column in df1.