How to utilize .apply on functions from geopy in pandas when creating new column from existing colum-CodePudding

So I am trying to find a more efficient way of doing a task I already made some code for. The purpose of the code is to use 4 columns (LATITUDE, LONGITUDE, YORK_LATITUDE, YORK_LONGITUDE) to create a new column which calculates the distance between two coordinates in kilometers in a panda dataframe. Where the first coordinate is (LATITUDE, LONGITUDE) and the second coordinate is (YORK_LATITUDE, YORK_LONGITUDE).

A link of what the table looks like

In order to complete the task right now I create a list using the following code (geopy and pandas iterrows), convert that into a column and concatenate that to the dataframe. This is cumbersome, I know that there is an easier way to utilize .apply and the geopy function, but I haven't been able to figure out the syntax.

from geopy.distance import geodesic as GD
list = []
for index, row in result.iterrows():
    coordinate1 = (row['LATITUDE'], row['LONGITUDE'])
    coordinate2 = (row['LATITUDE_YORK_UNIVERSITY'], row['LONGITUDE_YORK_UNIVERSITY'])
    list.append(GD(coordinate1, coordinate2).km)

CodePudding user response：

TL;DR

df.apply(lambda x: distance(x[:2], x[2:]), axis=1)

Some explanation

Let's say we have a function, which requires two tuples as arguments. For example:

from math import dist

def distance(point1: tuple, point2: tuple) -> float:
    
    # suppose that developer checks the type
    # so we can pass only tuples as arguments
    assert type(point1) is tuple
    assert type(point2) is tuple

    return dist(point1, point2)

Let's apply the function to this data:

df = pd.DataFrame(
    data=np.arange(10*4).reshape(10, 4),
    columns=['long', 'lat', 'Y long', 'Y lat']
)

We pass to apply two parameters: axis=1 to iterate over rows, and a wrapper over distance as a lambda-function. To split the row in tuples we can apply tuple(...) or `(*...,), note the comma at the end in the latter option:

df.apply(lambda x: distance((*x[:2],), (*x[2:],)), axis=1)

The thing is that geopy.distance doesn't require exactly tuples as an arguments, they can be any iterables with 2 to 3 elements (see the endpoint how an argument is transformed into the inner type Point while defining distance). So we can simplify this to:

df.apply(lambda x: distance(x[:2], x[2:]), axis=1)

To make it independent from the columns order we could write this (in your terms):

common_point = ['LATITUDE','LONGITUDE']
york_point = ['LATITUDE_YORK_UNIVERSITY','LONGITUDE_YORK_UNIVERSITY']
result.apply(lambda x: GD(x[common_point], x[york_point]).km, axis=1)