Home > Mobile >  How to utilize .apply on functions from geopy in pandas when creating new column from existing colum
How to utilize .apply on functions from geopy in pandas when creating new column from existing colum

Time:08-22

So I am trying to find a more efficient way of doing a task I already made some code for. The purpose of the code is to use 4 columns (LATITUDE, LONGITUDE, YORK_LATITUDE, YORK_LONGITUDE) to create a new column which calculates the distance between two coordinates in kilometers in a panda dataframe. Where the first coordinate is (LATITUDE, LONGITUDE) and the second coordinate is (YORK_LATITUDE, YORK_LONGITUDE).

A link of what the table looks like

In order to complete the task right now I create a list using the following code (geopy and pandas iterrows), convert that into a column and concatenate that to the dataframe. This is cumbersome, I know that there is an easier way to utilize .apply and the geopy function, but I haven't been able to figure out the syntax.

from geopy.distance import geodesic as GD
list = []
for index, row in result.iterrows():
    coordinate1 = (row['LATITUDE'], row['LONGITUDE'])
    coordinate2 = (row['LATITUDE_YORK_UNIVERSITY'], row['LONGITUDE_YORK_UNIVERSITY'])
    list.append(GD(coordinate1, coordinate2).km)

CodePudding user response:

TL;DR

df.apply(lambda x: distance(x[:2], x[2:]), axis=1)

Some explanation

Let's say we have a function, which requires two tuples as arguments. For example:

from math import dist

def distance(point1: tuple, point2: tuple) -> float:
    
    # suppose that developer checks the type
    # so we can pass only tuples as arguments
    assert type(point1) is tuple
    assert type(point2) is tuple

    return dist(point1, point2)

Let's apply the function to this data:

df = pd.DataFrame(
    data=np.arange(10*4).reshape(10, 4),
    columns=['long', 'lat', 'Y long', 'Y lat']
)

We pass to apply two parameters: axis=1 to iterate over rows, and a wrapper over distance as a lambda-function. To split the row in tuples we can apply tuple(...) or `(*...,), note the comma at the end in the latter option:

df.apply(lambda x: distance((*x[:2],), (*x[2:],)), axis=1)

The thing is that geopy.distance doesn't require exactly tuples as an arguments, they can be any iterables with 2 to 3 elements (see the endpoint how an argument is transformed into the inner type Point while defining distance). So we can simplify this to:

df.apply(lambda x: distance(x[:2], x[2:]), axis=1)

To make it independent from the columns order we could write this (in your terms):

common_point = ['LATITUDE','LONGITUDE']
york_point = ['LATITUDE_YORK_UNIVERSITY','LONGITUDE_YORK_UNIVERSITY']
result.apply(lambda x: GD(x[common_point], x[york_point]).km, axis=1)
  • Related