Home > Software design >  Vectorising max distance function
Vectorising max distance function

Time:11-04

Really quick questions,

I have the following function for distance

def distance(a1,a2,b1,b2):
     return sqrt((a2-a1)**2   (b2-b1)**2)

I want to calculate distances between each point in column A in my dataframe and column B and save maximum in column C.

For now I iterate through each, in a nested loop and use distance(df.loc[i, colA], dftest.loc[i,colB], dftest.loc[j,colA], dftest.loc[j.colB]) and check if greater than previous. I know there is a way to vectorise it, just can't get my head around it.

I don't need any ready function, but clear way how to vectorise it please.

I appreciate any help!

EDIT: Example of the dataframe, with colB being desired output:

ColA| ColB| ColC
7.6 |8.2  |6.79 (max distance which is between this row and row3)
6.6 |4.4  |3.92 (max distance is with row1, greater than row3)
4.4 |2.2  |6.79 (max distance is with row1)

So eg ColC in first row is calculated with distance(7.6,8.2,4.4,2.2) but function distnace has to go thorugh all combinations

With larger dfs that gets really expensive

CodePudding user response:

Since you don't want to use libraries, you can use the underlying numpy array, broadcast the vectorial computation and get the max:

import numpy as np
a = df.values  # easier reference to numpy array
b = (a[:,0]-a[:,0,None])**2 (a[:,1]-a[:,1,None])**2  # (a2-a1)**2   (b2-b1)**2
df['ColC'] = np.sqrt(b.max(0))

output:

   ColA  ColB      ColC
0   7.6   8.2  6.800000
1   6.6   4.4  3.929377
2   4.4   2.2  6.800000
  • Related