Efficient function mapping with arguments in numpy-CodePudding

I am trying to create a heightmap by interpolating between a bunch of heights at certain points in an area. To process the whole image, I have the following code snippet:


map_ = np.zeros((img_width, img_height))

for x in range(img_width):
    for y in range(img_height):
        map_[x, y] = calculate_height(set(points.items()), x, y)

This is calculate_height:

def distance(x1, y1, x2, y2) -> float:
    return np.sqrt((x1 - x2) ** 2   (y1 - y2) ** 2)


def calculate_height(points: set, x, y) -> float:
    total = 0
    dists = {}
    for pos, h in points:
        d = distance(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d = 1 / (d ** 2)
        dists[pos] = d
        total  = d

    r = 0
    for pos, h in points:
        ratio = dists[pos] / total
        r  = ratio * h

    r = r
    return r

This snippet works perfectly, but if the image is too big, it takes a long time to process, because this is O(n^2). The problem with this, is that a "too big" image is 800 600, and it takes almost a minute to process, which to me seems a bit excessive.

My goal is not to reduce the time complexity from O(n^2), but to reduce the time it takes to process images, so that I can get decently sized images in a reasonable amount of time.

I found this post, but I couldn't really try it out because it's all for a 1D array, and I have a 2D array, and I also need to pass the coordinates of each point and the set of existing points to the calculate_height function. What can I try to optimize this code snippet?

Edit: Moving set(points.items) out of the loop as @thethiny suggested was a HUGE improvement. I had no idea it was such a heavy thing to do. This makes it fast enough for me, but feel free to add more suggestions for the next people to come by!

Edit 2: I have further optimized this processing by including the following changes:

    # The first for loop inside the calculate_distance function
    for pos, h in points:
        d2 = distance2(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d2 = d2 ** -1 # 1 / (d ** 2) == d ** -2 == d2 ** -1
        dists[pos] = d2 # Having the square of d on these two lines makes no difference
        total  = d2

This reduced execution time for a 200x200 image from 1.57 seconds to 0.76 seconds. The 800x600 image mentioned earlier now takes 6.13 seconds to process :D

CodePudding user response：

There's a few problems with your implementation.

Essentially what you're implementing is approximation using radial basis functions.

The usual algorithm for that looks like:

sum_w = 0
sum_wv = 0
for p,v in points.items():
   d = distance(p,x)
   w = 1.0 / (d*d)
   sum_w  = w
   sum_wv  = w*v
return sum_wv / sum_w

Your code has some extra logic for bailing out if p==x - which is good. But it also allocates an array of distances - which this single loop form does not need.

This brings execution of an example in a workbook from 13s to 12s.

The next thing to note is that collapsing the points dict into an numpy array gives us the chance to use numpy functions.

points_array = np.array([(p[0][0],p[0][1],p[1]) for p in points.items()]).astype(np.float32)

Then we can write the function as

def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This brings our time down to 658ms. But we can do better yet...

Since we're now using numpy functions we can apply numba.njit to JIT compile our function.

@numba.njit
def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This was giving me 105ms (after the function had been run once to ensure it got compiled).

This is a speed up of 130x over the original implementation (for my data)

You can see the full implementations here

CodePudding user response：

This really is a small addition to @MichaelAnderson's detailed answer.

Probably calculate_height_fast() can get faster by optimizing a bit more with explicit looping:

@numba.njit
def calculate_height_faster(points, x, y) -> float:
    dx = points[:, 0] - x
    dy = points[:, 1] - y
    r = np.hypot(dx, dy)
    # compute weighted average
    n = r.size
    sum_w = sum_wp = 0
    for i in range(n):
        w = 1.0 / (r[i] * r[i])
        sum_w  = w
        sum_wp  = points[i, 2] * w
    return sum_wp / sum_w