Home > Mobile >  How to search for a point in a text file of many points?
How to search for a point in a text file of many points?

Time:06-19

I have two text files; the first containing a group of points of x and y, and the second has much more points of x,y and z including the first file points. I want to search for each point of the first file in the second one to get the z coordinate. I have made that in python but it's taking too much time! so if any one has a faster code i would be glad

with open('D:/Point cloud data/Data mabna 4/Data_without_RGB/Downsampled/Walls_only_downsampled_5cm.txt') as f1, open("D:/Point cloud data/Data mabna 4/Data_without_RGB/Downsampled/Walls_only_3d.txt", 'a') as f3:

    for line in f1:
        E, N = line.split()
        E = float(E)
        E = float("{0:.3f}".format(E))
        N = float(N)
        N = float("{0:.3f}".format(N))
        with open('D:/Point cloud data/Data mabna 4/Data_without_RGB/Downsampled/Combined_cropped_downsampled_5cm.txt') as f2:
            for row in f2:
                x, y, z = row.split()
                x = float(x)
                x = float("{0:.3f}".format(x))
                y = float(y)
                y = float("{0:.3f}".format(y))

                if E==x and N==y:
                    f3.write(row)

CodePudding user response:

I'm assuming your files are not so large that you cannot store all of them in memory. Also, I'm assuming you have one file with a lot of points (the one with the z-coordinates, that I will call full_coord) and the files with the points you want to find the z-coordinate, that I will call partial_coord.

The idea is to read full_coord in memory and put it in a dictionary, so that you can look it up by (x, y) and get a triple (x, y, z) as a result. Reading files (and manipulating strings) is about 1 million times slower than manipulating things in memory. Accessing dictionaries (even large ones) in Python is very fast, close to O(1). This should work without issues in a modern machine for a few tens of millions of points.

At last, based on your code, I'm assuming you are only interested in a precision of 3 decimal places, so all entries can be truncated (I'll truncate in the key only, so the output will be full precision).

Note that the code below assumes that the (x, y) coordinates of a point define the z coordinate uniquely. If there are several points with the same (x, y) but different z, the code has to be changed (it is an easy change using a defaultdict).

precision = 3
full_coord_filename = r'file_with_z_coord.txt'
partial_coord_filename = r'file_with_partial_coordinates.txt'

full_coord_points = {}
with open(full_coord_filename, 'r') as fh:
  for i, row in enumerate(fh):
    x, y ,z = row.split()
    key = (round(float(x), precision), round(float(y), precision))
    if key in full_coord_points:  # Check cases in which the rounding is overlapping points
      print(f'Warning: Found repeated key: {key}, in row {i}')
    full_coord_points[key] = row  # We can save the row as a string

points_to_search = []
with open(partial_coord_filename, 'r') as fh:
  for row in fh:
    x, y = row.split()
    points_to_search.append((round(float(x), precision), round(float(y), precision)))  # Also an (x, y) tuple

# Search for all the points and collect the results. Also collect the points that were not found
found = []
not_found = []
for point in points_to_search:
  try:
    found.append(full_coord_points[point])
  except KeyError:
    not_found.append(point)

# you can save your list of found points
with open('output_file.txt', 'w') as fh:
  for row in found:
    fh.write(row)

# And print the points that were not found
print('The following points were not found:')
for point in not_found:
  print(point)
  • Related