Home > Blockchain >  replace nested for loops combined with conditions to boost performance
replace nested for loops combined with conditions to boost performance

Time:11-19

In order to speed up my code I want to exchange my for loops by vectorization or other recommended tools. I found plenty of examples with replacing simple for loops but nothing for replacing nested for loops in combination with conditions, which I was able to comprehend / would have helped me...

With my code I want to check if points (X, Y coordinates) can be connected by lineaments (linear structures). I started pretty simple but over time the code outgrew itself and is now exhausting slow... Here is an working example of the part taking the most time:

import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString, Point
from shapely.affinity import rotate
from math import sqrt
from tqdm import tqdm 
import random as rng

# creating random array of points
xys = rng.sample(range(201 * 201), 100)
points = [list(divmod(xy, 201)) for xy in xys]

# plot points
plt.scatter(*zip(*points))

# calculate length for rotating lines -> diagonal of bounds so all points able to be reached
length = sqrt(2)*200

# calculate angles to rotate lines 
angles = []
for a in range(0, 360, 1):
    angle = np.deg2rad(a)
    angles.append(angle)

# copy points array to helper array (points_list) so original array is not manipulated
points_list = points.copy()

# array to save final lines
lines = []

# iterate over every point in points array to search for connecting lines
for point in tqdm(points):
    # delete point from helper array to speed up iteration -> so points do not get
    # double, triple, ... checked
    if len(points_list) > 0:
        points_list.remove(point)
    else:
        break
    # create line from original point to point at end of line (x length) - this line
    # gets rotated at calculated angles
    start = Point(point)
    end = Point(start.x length, start.y)
    line = LineString([start,end])
    # iterate over angle Array to rotate line by each angle
    for angle in angles:
        rot_line = rotate(line, angle, origin=start, use_radians=True)
        lst = list(rot_line.coords)
        # save starting point (a) and ending point(b) of rotated line for np.cross()
        # (cross product to check if points on/near rotated line)
        a = np.asarray(lst[0])
        b = np.asarray(lst[1])
        # counter to count number of points on/near line
        count = 0
        line_list = []
        # iterate manipulated points_list array (only points left for which there has
        # not been a line rotated yet)
        for poi in points_list:
            # check whether point (pio) is on/near rotated line by calculating cross 
            # product (np.corss())
            p = np.asarray(poi)
            cross = np.cross(p-a,b-a)
            # check if poi is inside accepted deviation from cross product
            if cross > -750 and cross < 750:
                # check if more than 5 points (poi) are on/near the rotated line
                if count < 5:
                    line_list.append(poi)
                    count  = 1
                # if 5 points are connected by the rotated line sort the coordinates
                # of the points and check if the length of the line meets the criteria
                else:
                    line_list = sorted(line_list , key=lambda k: [k[1], k[0]])
                    line_length = LineString(line_list)
                    if line_length.length >= 10 and line_length.length <= 150:
                        lines.append(line_list)
                    break

# use shapeplys' MultiLineString to create lines from coordinates and plot them 
# afterwards
multiLines = MultiLineString(lines)

fig, ax = plt.subplots()
ax.set_title("Lines")
for multiLine in MultiLineString(multiLines).geoms:
    # print(multiLine)
    plt.plot(*multiLine.xy)

As mentioned above it was thinking about using pandas or numpy vectorization and therefore build a pandas df for the points and lines (gdf) and one with the different angles (angles) to rotate the lines:

Name Type Size Value
gdf DataFrame (122689, 6) Column name: x, y, value, start, end, line
angles DataFrame (360, 1) Column name: angle

But I ran out of ideas to replace this nested for loops with conditions with pandas vectorization. I found this article on medium and halfway through the article there are conditions for vectorization mentioned and I was wondering if my code maybe is not suitbale for vectorization because of dependencies within the loops...

If this is right, it does not necessarily needs to be vectoriation everything boosting the performance is welcome!

CodePudding user response:

You can quite easily vectorize the most computationally intensive part: the innermost loop. The idea is to compute the points_list all at once. np.cross can be applied on each lines, np.where can be used to filter the result (and get the IDs).

Here is the (barely tested) modified main loop:

for point in tqdm(points):
    if len(points_list) > 0:
        points_list.remove(point)
    else:
        break

    start = Point(point)
    end = Point(start.x length, start.y)
    line = LineString([start,end])

    # CHANGED PART

    if len(points_list) == 0:
        continue

    p = np.asarray(points_list)

    for angle in angles:
        rot_line = rotate(line, angle, origin=start, use_radians=True)
        a, b = np.asarray(rot_line.coords)
        cross = np.cross(p-a,b-a)
        foundIds = np.where((cross > -750) & (cross < 750))[0]

        if foundIds.size > 5:
            # Similar to the initial part, not efficient, but rarely executed
            line_list = p[foundIds][:5].tolist()
            line_list = sorted(line_list, key=lambda k: [k[1], k[0]])
            line_length = LineString(line_list)
            if line_length.length >= 10 and line_length.length <= 150:
                lines.append(line_list)

This is about 15 times faster on my machine.

Most of the time is spent in the shapely module which is very inefficient (especially rotate and even np.asarray(rot_line.coords)). Indeed, each call to rotate takes about 50 microseconds which is simply insane: it should take no more than 50 nanoseconds, that is, 1000 time faster (actually, an optimized native code should be able to to that in less than 20 ns on my machine). If you want a faster code, then please consider not using this package (or improving its performance).

  • Related