Trying to find all coordinate points within a certain range-CodePudding

What I am trying to achieve here is that I have one source csv file, filled with coordinates and an additional target csv file with more coordinates from which I want to find all the coordinates in the target csv file that are with in certain range from every single coordinate in the source csv file.

The coordinates are formatted as xx.xxxxxx and yy.yyyyyy.

"lat1" and "long1" are the names of the coordinate columns in the source csv and "lat2" and "long2" are the coordinate columns in the target csv.

import pandas as pd
import numpy as np
import time 
from playsound import playsound

fast_df = pd.read_csv('target.csv') # 2
el_df = pd.read_csv('source.csv') # 1

"""
Commandos:
    
    coords_file.columns - get columns
    coords_file.drop_duplicates() - removes identical rows
    coords_flie.iloc[] - fetch row with index
    coords_file[['OBJEKT_ID', 'EXTERNID', 'DETALJTYP']]
    
"""


def findDistance(row, source_lat, source_long):
    # print(row, source_lat, source_long)
    row_lat = row['lat2']
    row_long = row['long2']
    lat_diff = np.abs(source_lat - row_lat)/0.00001 # divide by 0.00001 to convert to meter
    long_diff = np.abs(source_long - row_long)/0.00001
    row['Distance'] = np.sqrt(lat_diff**2 long_diff**2)
    return row

def findDistance_(source_coordinates, target_coordinates):
    lat_diff = np.abs(source_coordinates[0] - target_coordinates[0])/0.00001 # divide by 0.00001 to convert to meter
    long_diff = np.abs(source_coordinates[1] - target_coordinates[1])/0.00001
    Distance = np.sqrt(lat_diff**2 long_diff**2)
    easyDistanceReader(Distance)
    return Distance

def easyDistanceReader(Distance):
    if Distance > 1000:
        Distance = Distance/1000
        print("Distance:", Distance, "km")
    else:
        print("Distance:", Distance, "m")


def runProgram(target_df, source_df, distans_threshold):
    
    """
    Loop over coord in source.csv 
        --> Find all the coordinates within the interval in target.csv
    """
    
    "Using this in order to skip coordinates in source.csv which are outside the target.csv     area"
    latInterval = min(target_df['lat2']), max(target_df['lat2'])
    longInterval = min(target_df['long2']), max(target_df['long2'])
    
    "Find all relevant coordinates based on the source coordinates"
    source_df = source_df.loc[(source_df['lat1'].between(min(latInterval), max(latInterval))) &     (source_df['long1'].between(min(longInterval), max(longInterval)))]

    dataframes = []
    start = time.time()
    for index in range(len(source_df)):
        row = source_df.iloc[index]
        source_coordinates = row[['lat1','long1']]
        
        indices = []
        target_df = target_df.apply(findDistance, args=(row['lat1'],row['long1']), axis=1)
        
        relevantTargets = target_df.loc[target_df['Distance'] < distans_threshold]
        if len(relevantTargets) > 0:
            indices.append(relevantTargets.index[0])

        if len(indices) > 0:
            new_df = target_df.loc[indices]
            dataframes.append(new_df)
        
    final_df = pd.concat(dataframes)


    final_df = final_df.loc[:, final_df.columns != 'Distance'].drop_duplicates()
    print(final_df)
    
    end = time.time()
    print("Elapsed time per iteration:", end-start)
    
    final_df.to_csv('final.csv')
    playsound('audio.mp3')

runProgram(fast_df,el_df, 300) # This number indicates the distance in meters from source coordinates I want to find target coordinates.

The result I'm getting currently is this. This is a result from when I'm running the code at 5000 meters. You can clearly see that a lot of the coordinate points are left out and I cannot figure out why. The black points are source points, the brown target points and the pink are the resulting points.

Any ideas would be greatly appreciated!

CodePudding user response：

The line indices.append(relevantTargets.index[0]) seams to only append the first index of relevantTargets, when you want all of relevantTarget's indices. Try replacing this with indices = [*relevantTargets.index]. However, I don't know why you can't directly do dataframes.append(relevantTargets) here instead.

CodePudding user response：

If I'm thinking about your question correctly, this approach may be simplified by using GeoPandas.sjoin_nearest [link here]. I've used the geopandas.sjoin module in the past to join points to intersecting polygons. I believe sjoin_nearest was just added in the most recent release.