I'm trying to speed up my implementation of
CodePudding user response:
First of all, it is not possible to call pure-Python functions from Numba nopython jitted functions (aka njit functions). This is because Numba needs to track types at compile time to generate an efficient binary.
Moreover, Numba cannot compile the expression pixel[:, np.newaxis].T
because of np.newaxis
which appear not to be supported yet (probably because np.newaxis
is None
). You can use pixel.reshape(3, -1).T
instead.
Note that you should be careful about the types because doing a - b
when both variables are of type np.uint8
results in a possible overflow (eg. 0 - 1 == 255
, or even more surprizing: 0 - 256 = 65280
when b
is a literal integer and a
of type np.uint8
). Note that the array is computed in-place and that pixels are written before
The generated code will not be very efficient although Numba make a good job. You can iterate over the colors yourself using a loop to find the minimum index. This is a bit better because it does not generate many small temporary arrays. You can also specify the types so that Numba will compile the function ahead of time. That being said. This also make the code lower-level and so more verbose/harder-to-maintain.
Here is an optimized implementation:
@nb.njit('int32[::1](uint8[::1])')
def nb_findClosestColour(pixel):
colors = np.array([[255, 255, 255], [255, 0, 0], [0, 0, 255], [255, 255, 0], [0, 128, 0], [253, 134, 18]], dtype=np.int32)
r,g,b = pixel.astype(np.int32)
r2,g2,b2 = colors[0]
minDistance = np.abs(r-r2) np.abs(g-g2) np.abs(b-b2)
shortest = 0
for i in range(1, colors.shape[0]):
r2,g2,b2 = colors[i]
distance = np.abs(r-r2) np.abs(g-g2) np.abs(b-b2)
if distance < minDistance:
minDistance = distance
shortest = i
return colors[shortest]
@nb.njit('uint8[:,:,::1](uint8[:,:,::1])')
def nb_floydDither(img_array):
assert(img_array.shape[2] == 3)
height, width, _ = img_array.shape
for y in range(0, height-1):
for x in range(1, width-1):
old_pixel = img_array[y, x, :]
new_pixel = nb_findClosestColour(old_pixel)
img_array[y, x, :] = new_pixel
quant_error = new_pixel - old_pixel
img_array[y, x 1, :] = img_array[y, x 1, :] quant_error * 7/16
img_array[y 1, x-1, :] = img_array[y 1, x-1, :] quant_error * 3/16
img_array[y 1, x, :] = img_array[y 1, x, :] quant_error * 5/16
img_array[y 1, x 1, :] = img_array[y 1, x 1, :] quant_error * 1/16
return img_array
The naive version is 14 times faster while the last one is 19 times faster.