Replace a 2D point in one dataframe with a 2D point in another dataframe if the Euclidean between th-CodePudding

I have a data frame df1 with two columns V1 and V2 representing two coordinates of a point.

df1

V1          V2
1.30344679  0.060199021
1.256628917 0.095897457
0.954959945 0.237514922
1.240081297 0.053228255
1.35765432  0.033412217
1.228539425 0.079924064
1.080489363 0.204162117
1.27587021  0.085286683
1.44        0
0.93719247  0.310292371

There's another dataframe df2 with two columns C1 and C2 representing two coordinates of a point.

df2

C1          C2
0.083       0.323657888
1.293934451 0.046950426
1.252872503 0.09000528
0.148131303 0.347930828

df1 and df2 have different lengths. In this example, there will be replacements for four points in df1. Essentially, four points in df2 replaces four points in df1 if the Euclidean between them is the lowest.

We can also say, each point in df2 replaces only the closest point in df1. How can we achieve this?

Duplicate issue: The number of digits after decimal is 9. So, I assume the issue of duplicate will not arise (i.e., more than one point in df1 have same Euclidean distance and the distance value is the lowest). If it arises, we can replace any one of the row at random?

Desired output: revised_df1 of same length as df1 but revised_df1 has four points from df2 replaced.

CodePudding user response：

Here's a solution that works with the data as lists. Modifying it to work with a dataframe is an exercise left to the reader. Honestly, since this needs to be done row by row, it might be better to pull the columns out as lists and convert them back later.

Note that, as I tried to imply above, this does NOT guarantee the "optimal" solution. For each point in df2, we pick the closest point in df1 that has not already been replaced. It's quite possible that another choice would result in less TOTAL error.

import math

df1 = [
[1.30344679 ,  0.060199021],
[1.256628917,  0.095897457],
[0.954959945,  0.237514922],
[1.240081297,  0.053228255],
[1.35765432 ,  0.033412217],
[1.228539425,  0.079924064],
[1.080489363,  0.204162117],
[1.27587021 ,  0.085286683],
[1.44       ,  0],
[0.93719247 ,  0.310292371]
]

df2 = [
[0.083      ,  0.323657888],
[1.293934451,  0.046950426],
[1.252872503,  0.09000528],
[0.148131303,  0.347930828]
]

def printer(d):
    for row in d:
        print( "%.9f %.9f" % tuple(row) )

def dist(p1,p2):
    return math.sqrt( (p1[0]-p2[0])**2   (p1[1]-p2[1])**2 )

# For each point in df2:

print("Before")
printer(df1)

replaced = set()
for p2 in df2:
    # Compute the distance to each point in df1.
    distances = [(dist(p1,p2), i1) for (i1,p1) in enumerate(df1)]
    # Sort them by distance.
    distances.sort()
    # Pick the closest that has not already been replaced.
    top = distances.pop(0)
    while top[1] in replaced:
        top = distances.pop(0)
    # Replace it.
    df1[top[1]] = p2
    replaced.add( top[1] )

print("After")
printer(df1)

Output:

Before
1.303446790 0.060199021
1.256628917 0.095897457
0.954959945 0.237514922
1.240081297 0.053228255
1.357654320 0.033412217
1.228539425 0.079924064
1.080489363 0.204162117
1.275870210 0.085286683
1.440000000 0.000000000
0.937192470 0.310292371
After
1.293934451 0.046950426
1.252872503 0.090005280
0.148131303 0.347930828
1.240081297 0.053228255
1.357654320 0.033412217
1.228539425 0.079924064
1.080489363 0.204162117
1.275870210 0.085286683
1.440000000 0.000000000
0.083000000 0.323657888