I am trying to create a heightmap by interpolating between a bunch of heights at certain points in an area. To process the whole image, I have the following code snippet:
map_ = np.zeros((img_width, img_height))
for x in range(img_width):
for y in range(img_height):
map_[x, y] = calculate_height(set(points.items()), x, y)
This is calculate_height
:
def distance(x1, y1, x2, y2) -> float:
return np.sqrt((x1 - x2) ** 2 (y1 - y2) ** 2)
def calculate_height(points: set, x, y) -> float:
total = 0
dists = {}
for pos, h in points:
d = distance(pos[0], pos[1], x, y)
if x == pos[0] and y == pos[1]:
return h
d = 1 / (d ** 2)
dists[pos] = d
total = d
r = 0
for pos, h in points:
ratio = dists[pos] / total
r = ratio * h
r = r
return r
This snippet works perfectly, but if the image is too big, it takes a long time to process, because this is O(n^2). The problem with this, is that a "too big" image is 800 600, and it takes almost a minute to process, which to me seems a bit excessive.
My goal is not to reduce the time complexity from O(n^2), but to reduce the time it takes to process images, so that I can get decently sized images in a reasonable amount of time.
I found
this post, but I couldn't really try it out because it's all for a 1D array, and I have a 2D array, and I also need to pass the coordinates of each point and the set of existing points to the calculate_height
function. What can I try to optimize this code snippet?
Edit: Moving set(points.items)
out of the loop as @thethiny suggested was a HUGE improvement. I had no idea it was such a heavy thing to do. This makes it fast enough for me, but feel free to add more suggestions for the next people to come by!
Edit 2: I have further optimized this processing by including the following changes:
# The first for loop inside the calculate_distance function
for pos, h in points:
d2 = distance2(pos[0], pos[1], x, y)
if x == pos[0] and y == pos[1]:
return h
d2 = d2 ** -1 # 1 / (d ** 2) == d ** -2 == d2 ** -1
dists[pos] = d2 # Having the square of d on these two lines makes no difference
total = d2
This reduced execution time for a 200x200 image from 1.57 seconds to 0.76 seconds. The 800x600 image mentioned earlier now takes 6.13 seconds to process :D
CodePudding user response:
There's a few problems with your implementation.
Essentially what you're implementing is approximation using radial basis functions.
The usual algorithm for that looks like:
sum_w = 0
sum_wv = 0
for p,v in points.items():
d = distance(p,x)
w = 1.0 / (d*d)
sum_w = w
sum_wv = w*v
return sum_wv / sum_w
Your code has some extra logic for bailing out if p==x
- which is good.
But it also allocates an array of distances - which this single loop form does not need.
This brings execution of an example in a workbook from 13s to 12s.
The next thing to note is that collapsing the points dict into an numpy array gives us the chance to use numpy functions.
points_array = np.array([(p[0][0],p[0][1],p[1]) for p in points.items()]).astype(np.float32)
Then we can write the function as
def calculate_height_fast(points, x, y) -> float:
dx = points[:,0] - x
dy = points[:,1] - y
r = np.hypot(dx,dy)
w = 1.0 / (r*r)
sum_w = np.sum(w)
return np.sum(points[:,2] * w) / np.sum(w)
This brings our time down to 658ms. But we can do better yet...
Since we're now using numpy functions we can apply numba.njit
to JIT compile our function.
@numba.njit
def calculate_height_fast(points, x, y) -> float:
dx = points[:,0] - x
dy = points[:,1] - y
r = np.hypot(dx,dy)
w = 1.0 / (r*r)
sum_w = np.sum(w)
return np.sum(points[:,2] * w) / np.sum(w)
This was giving me 105ms (after the function had been run once to ensure it got compiled).
This is a speed up of 130x over the original implementation (for my data)
You can see the full implementations here
CodePudding user response:
This really is a small addition to @MichaelAnderson's detailed answer.
Probably calculate_height_fast()
can get faster by optimizing a bit more with explicit looping:
@numba.njit
def calculate_height_faster(points, x, y) -> float:
dx = points[:, 0] - x
dy = points[:, 1] - y
r = np.hypot(dx, dy)
# compute weighted average
n = r.size
sum_w = sum_wp = 0
for i in range(n):
w = 1.0 / (r[i] * r[i])
sum_w = w
sum_wp = points[i, 2] * w
return sum_wp / sum_w